Conclusion

In this work we presented two similarity-based methods for automatic playlist continuation and a baseline model for comparison. We developed them inspired by the RecSys Challenge 2018 and other works on the topic. The dataset used was generated by the Spotify API and the models were developed using Python.

We represented the similarity through a sparse matrix, an efficient data structure for representing a large collection of playlists and tracks, in special we can have multiple lines and columns, as long as many spaces are zero.

In the baseline model, we did a random walk with uniform probability in the sparse matrix, with probability of restart the process being a geometric distribution with parameter $\alpha$ . In the first model, we found that with small start playlists the algorithm outperforms, what is a good result. However this model is not much scalable and some changes must be made, like a prefiltering. In the second model we found that the sparse matrix takes little time to be created and the model acchieved a reasonable performance compared to the models from the RecSys Challenge 2018. We saw that our models outperform the baseline model.

Some directions we could follow for future works is combining different algorithms, hybrid algorithms. Other interesting thing to do is to considerer the individual behaviour not directly associated with the playlists made by them, but associated with their psyco, e.g. use the fact the first and last algorithms are more important for the user.

We hope you enjoy it!