推荐系统笔记一：overview

Motivated by Dr. Wu, briefly summarize the paper for future reference.

This overview is based on my understanding of the paper: Zeynep Batmaz, Ali Yurekli, Alper Bilge, Cihan Kaleli, A review on deep learning for recommender systems: challenges and remedies, 2018.

Why we need RS?
Solve information overload problem

Classification:

Collaborative filtering RS
- Memory-based
  Utilize the entire user-item matrix to identify similar entities. After locating the nearest neighbors, past ratings of these entities are employed for recommendation purposes.
  User-based: Employ past preferences of nearest neighbors to user a
  Item-based: Employ the ratings of similar items to item q
- Model-based
  Aim to build an offline model by applying machine learning and data mining techniques. Building and training such model allows estimating predictions for online CF tasks.
Content-based RS
The main purpose is to recommend items that are similar to those that a user liked in the past. For instance, if a user likes a website that contains keywords such as “stack”, “queue”, and “sorting”, a content-based recommender system would suggest pages related with data structures and algorithms.
Hybrid RS

The main difference between collaborative filtering and content-based is that CF relies on the past history of user behavior, i.e. user and item rating while content-based relies on item or user attribute, i.e. content distribution.

Challenges and solutions

Accuracy: usually judged by three ways, the accuracy of rating predictions, usage predictions, and ranking of items.
Solution: use ML model to extract hidden features and jointly combine information from varying sources.
Sparsity or Cold-start: lack of data, i.e. user ratings or new user information
Solution: use ML model to extract high dimensional and denser feature representation/ use ML model to extract features from heterogeneous data sources/ combine content-based RS for cold-start problem.
Scalability: balance between model complexity and respond time.
Solution: use ML model to extract high dimensional data, i.e. less dimensions/ modify ML model to accelerate training process/ parallel computing

The accuracy in CF system is not simply equal to the prediction accuracy as normal machine learning tasks. A good model should give both related items and thrilling items which might attract users, i.e. it should balance exploration and exploitation.

推荐系统笔记一：overview

猜你喜欢