Understanding Consumer Preferences Through Latent Spaces / Entendendo as preferências do consumidor através dos espaços latentes

André Uratsuka Manoel, Gustavo Corrêa Mirapalheta, João Luiz Chela

Abstract


The Latent Spaces technique has applications in areas such as natural language processing, image recognition and multi-language translation. It permits embedding vectors use. These are vectors in a k-dimensional vector space representing increasingly advanced study objects learning models, forming entirely new basis areas. Those vectors can capture semantic study object features and once trained can be reused in other models, decreasing training time and increasing knowledge transfer. This happens in Google Word2Vec and Facebook FastText pre-trained word vectors set. This work explores Latent Spaces techniques to understand preferences through recommendation mechanism implementation on top of MovieLens dataset from University of Minnesota. From it were extracted a sequence of triples (userId, movieId, rating), representing ratings given by users to particular films. Two k-dimensional Latent Vector Spaces representing film characteristics and the corresponding user preferences were created using Google Tensorflow, Machine Learning techniques like SGD and Matrix Factorization. The performance benchmark was the mean-square error on the dimensionality k of the Latent Spaces. The capacity of the Latent Spaces was than evaluated to abstract non-trivial information about films and compared with small cosine distance technique. The system was able to find subjective related similarities that would be tough to code in a straightforward manner. Finally, it was explored an alternative to generate user vectors through neural networks. The techniques exposed here support the case that Machine Learning techniques like Latent Spaces can be used in business decision making.


Keywords


Latent Spaces, TensorFlow, Machine Learning, Recommendation Systems, User Preferences

References


REFERENCES

Bartholomew, D. Knott, M., Moustaki, I. (2011). Latent Variable Models and Factor Analysis: A Unified Approach. 3rd Edition. John Wiley and Sons, Ltd. London.

Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., Harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391–407, 1990. Retrieved November 25, 2017 fromhttp://lsa.colorado.edu/papers/JASIS.lsi.90.pdf

Despois, J. (2017 Feb 14). Latent Space Visualization: Deep Learning bits #2. [Weblog post]. Retrieved November 23, 2017 from https://medium.com/@juliendespois/latent-space-visualization-deep-learning-bits-2-bd09a46920df.

Goh, G. Decoding the Thought Vector. (2017) Blog Post. Available from http://gabgoh.github.io/ThoughtVectors/. Retrieved November 12, 2017.

Hinton. G. E. &Salakhutdinov, R. R. (2006). Reducing the Dimensionality of Data with Neural Networks. Science Magazine (313), p. 504-507. DOI: 10.1126/science.1127647. Retrieved Nov 12, 20017 from https://pdfs.semanticscholar.org/7d76/b71b700846901ac4ac119403aa737a285e36.pdf.

Johnson, M. et al. (2016, November 14). Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. arXiv: 1611.04558v2[cs.CL]. Retrieved November 12, 2017 from https://arxiv.org/abs/1611.04558.

Jolliffe, I. T. (2002). Principal Component Analysis. New York: Springer.

Koren, Y., Bell, R., &Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37. https://doi.org/10.1109/MC.2009.263

Koren, Y. The BellKor Solution to the Netflix Grand Prize. (2009, August). Retrieved November 27, 2017 from https://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf.

Maxwell Harper, F., Konstan, J. A. (2015). The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872

Mikolov, T., Yih, W., & Zweig, G. (2013a). Linguistic regularities in continuous space word representations. Proceedings of NAACL-HLT, (June), 746–751.

Mikolov, T., Corrado G., Chen, K. Dean, J. (2013b, September 7). Efficient Estimation of Word Representations in Vector Space. arXiv:1301.13781v3 [cs.CL]. Retrieved November 27, 2017.

Mikolov, T., Sutskever, I. Chen, K., Corrado, G. & Dean J. (2013c, October 1). Distributed Representation of Words and Phrases and Their Compositionality. arXiv:1310.4546 [cs.CL]. Retrieved November 14th, 2017.

Pearson, Karl. (1901). On lines and planes of closest fit to systems of points in space, Philosophical Magazine, Series 6, vol. 2, no. 11, pp. 559-572. Retrieved November 25, 2017 from http://www.stats.org.uk/pca/Pearson1901.pdf.

Piote , Chabbert. (2009). The Pragmatic Theory Solution to the Netflix Grand Prize. (2009, August). Retrieved November 27, 2017 from https://www.netflixprize.com/assets/GrandPrize2009_BPC_PragmaticTheory.pdf

Salton, G. et al (1975 November). A Vector Space Model For Semantic Indexing. Communications of the ACM. Volume 8, Number 11. Retrieved November 25, 2017 from https://pdfs.semanticscholar.org/4008/d78a584102086f2641bcb0dab51aff0d353b.pdf

Töscher, A., Jahrer M. (2009, September) The BigChaos Solution to the Netflix Grand Prize. Retrieved November 27, 2017 from https://www.netflixprize.com/assets/GrandPrize2009_BPC_BigChaos.pdf

Win, X. Xin L. Yihong G. (2003). Document Clustering Based on Non-Negative Matrix Factorization. SIGIR ’03. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval. Pages 267-273




DOI: https://doi.org/10.38152/bjtv4n1-004

Refbacks

  • There are currently no refbacks.