Embeddings From Ratings Correlation

class embedded_voting.EmbeddingsFromRatingsCorrelation(preprocess_ratings=None, svd_factor=0.95)[source]

Use the correlation with each voter as the embeddings.

Morally, we have two levels of embedding.

  • First, v_i = preprocess_ratings(ratings_voter_i) for each voter i, which is used as a computation step but not recorded.
  • Second, M = v @ v.T, which is recorded as the final embeddings.

Other attributes are computed and recorded:

  • n_sing_val: the number of relevant singular values when we compute the SVD. This is based on the Principal Component Analysis (PCA).
  • ratings_means: the mean rating for each voter (without preprocessing).
  • ratings_stds: the standard deviation of the ratings for each voter (without preprocessing).

Examples

>>> np.random.seed(42)
>>> ratings = np.ones((5, 3))
>>> generator = EmbeddingsFromRatingsCorrelation(preprocess_ratings=normalize)
>>> embeddings = generator(ratings)
>>> embeddings
EmbeddingsCorrelation([[1., 1., 1., 1., 1.],
                       [1., 1., 1., 1., 1.],
                       [1., 1., 1., 1., 1.],
                       [1., 1., 1., 1., 1.],
                       [1., 1., 1., 1., 1.]])
>>> embeddings.n_sing_val
1

In fact, the typical usage is with center_and_normalize:

>>> generator = EmbeddingsFromRatingsCorrelation(preprocess_ratings=center_and_normalize)
>>> embeddings = generator(ratings)
>>> embeddings
EmbeddingsCorrelation([[0., 0., 0., 0., 0.],
                       [0., 0., 0., 0., 0.],
                       [0., 0., 0., 0., 0.],
                       [0., 0., 0., 0., 0.],
                       [0., 0., 0., 0., 0.]])
>>> embeddings.n_sing_val
0