×

Statistical significance of the Netflix challenge. (English) Zbl 1330.62090

Summary: Inspired by the legacy of the Netflix contest, we provide an overview of what has been learned – from our own efforts, and those of others – concerning the problems of collaborative filtering and recommender systems. The data set consists of about 100 million movie ratings (from 1 to 5 stars) involving some 480 thousand users and some 18 thousand movies; the associated ratings matrix is about 99% sparse. The goal is to predict ratings that users will give to movies; systems which can do this accurately have significant commercial applications, particularly on the world wide web. We discuss, in some detail, approaches to “baseline” modeling, singular value decomposition (SVD), as well as kNN (nearest neighbor) and neural network models; temporal effects, cross-validation issues, ensemble methods and other considerations are discussed as well. We compare existing models in a search for new models, and also discuss the mission-critical issues of penalization and parameter shrinkage which arise when the dimensions of a parameter space reaches into the millions. Although much work on such problems has been carried out by the computer science and machine learning communities, our goal here is to address a statistical audience, and to provide a primarily statistical treatment of the lessons that have been learned from this remarkable set of data.

MSC:

62F07 Statistical ranking and selection procedures
62J07 Ridge regression; shrinkage estimators (Lasso)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62J10 Analysis of variance and covariance (ANOVA)
62M15 Inference from stochastic processes and spectral analysis

Software:

PRMLT
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] ACM SIGKDD (2007). KDD Cup and Workshop 2007. Available at .
[2] Adomavicius, G. and Tuzhilin, A. (2005). Towards the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17 634-749.
[3] Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection. Ann. Statist. 32 870-897. · Zbl 1092.62033 · doi:10.1214/00905360400000023
[4] Baron, A. (1984). Predicted squared error: A criterion for automatic model selection. In Self-Organizing Methods in Modeling (S. Farrow, ed.). Marcel Dekker, New York.
[5] Bell, R. and Koren, Y. (2007a). Lessons from the Netflix Prize challenge. ACM SIGKDD Explorations Newsletter 9 75-79.
[6] Bell, R. and Koren, Y. (2007b). Improved neighborhood-based collaborative filtering. In Proc. KDD Cup and Workshop 2007 7-14. ACM, New York.
[7] Bell, R. and Koren, Y. (2007c). Scalable collaborative filtering with jointly derived neighborhood interpolation weights. In Proc. Seventh IEEE Int. Conf. on Data Mining 43-52. IEEE Computer Society, Los Alamitos, CA.
[8] Bell, R., Koren, Y. and Volinsky, C. (2007a). Modeling relationships at multiple scales to improve accuracy of large recommender systems. In Proc. 13 th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining 95-104. ACM, New York.
[9] Bell, R., Koren, Y. and Volinsky, C. (2007b). The BellKor solution to the Netflix Prize. Available at .
[10] Bell, R., Koren, Y. and Volinsky, C. (2007c). Chasing \(1,000,000: How we won the Netflix Progress Prize. ASA Statistical and Computing Graphics Newsletter 18 4-12.\)
[11] Bell, R., Koren, Y. and Volinsky, C. (2008). The BellKor 2008 solution to the Netflix Prize. Available at .
[12] Bell, R. M., Bennett, J., Koren, Y. and Volinsky, C. (2009). The million dollar programming prize. IEEE Spectrum 46 28-33.
[13] Bennett, J. and Lanning, S. (2007). The Netflix Prize. In Proc. KDD Cup and Workshop 2007 3-6. ACM, New York.
[14] Berger, J. (1982). Bayesian robustness and the Stein effect. J. Amer. Statist. Assoc. 77 358-368. · Zbl 0491.62030 · doi:10.2307/2287253
[15] Bishop, C. M. (1995). Neural Networks for Pattern Recognition . Clarendon Press, New York. · Zbl 0868.68096
[16] Bishop, C. M. (2006). Pattern Recognition and Machine Learning . Springer, New York. · Zbl 1107.68072
[17] Breiman, L. (1996). Bagging predictors. Machine Learning 26 123-140. · Zbl 0858.68080
[18] Breiman, L. and Friedman, J. H. (1997). Predicting multivariate responses in multiple linear regression (with discussion). J. Roy. Statist. Soc. Ser. B 59 3-54. · Zbl 0897.62068 · doi:10.1111/1467-9868.00054
[19] Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2 121-167.
[20] Candes, E. and Plan, Y. (2009). Matrix completion with noise. Technical report, Caltech.
[21] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when \(p\) is much larger than \(n\). Ann. Statist. 35 2313-2351. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[22] Canny, J. F. (2002). Collaborative filtering with privacy via factor analysis. In Proc. 25 th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval 238-245. ACM, New York.
[23] Carlin, B. P. and Louis, T. A. (1996). Bayes and Empirical Bayes Methods for Data Analysis. Monogr. Statist. Appl. Probab. 69 . Chapman & Hall, London. · Zbl 0871.62012
[24] Casella, G. (1985). An introduction to empirical Bayes data analysis. Amer. Statist. 39 83-87. · doi:10.2307/2682801
[25] Chien, Y. H. and George, E. (1999). A Bayesian model for collaborative filtering. In Online Proc. 7 th Int. Workshop on Artificial Intelligence and Statistics . Fort Lauderdale, FL.
[26] Christianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods . Cambridge Univ. Press, Cambridge.
[27] Cohen, W. W., Schapire, R. E. and Singer, Y. (1999). Learning to order things. J. Artificial Intelligence Res. 10 243-270 (electronic). · Zbl 0915.68031
[28] Copas, J. B. (1983). Regression, prediction and shrinkage. J. Roy. Statist. Soc. Ser. B 45 311-354. · Zbl 0532.62048
[29] DeCoste, D. (2006). Collaborative prediction using ensembles of maximum margin matrix factorizations. In Proc. 23 rd Int. Conf. on Machine Learning 249-256. ACM, New York.
[30] Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W. and Harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the Amercan Society of Information Science 41 391-407.
[31] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39 1-38. · Zbl 0364.62022
[32] Efron, B. (1975). Biased versus unbiased estimation. Advances in Math. 16 259-277. · Zbl 0306.62010 · doi:10.1016/0001-8708(75)90114-0
[33] Efron, B. (1983). Estimating the error rate of a prediction rule: Improvement on cross-validation. J. Amer. Statist. Assoc. 78 316-331. · Zbl 0543.62079 · doi:10.2307/2288636
[34] Efron, B. (1986). How biased is the apparent error rate of a prediction rule? J. Amer. Statist. Assoc. 81 461-470. · Zbl 0621.62073 · doi:10.2307/2289236
[35] Efron, B. (1996). Empirical Bayes methods for combining likelihoods (with discussion). J. Amer. Statist. Assoc. 91 538-565. · Zbl 0868.62018 · doi:10.2307/2291646
[36] Efron, B. (2004). The estimation of prediction error: Covariance penalties and cross-validation (with discussion). J. Amer. Statist. Assoc. 99 619-642. · Zbl 1117.62324 · doi:10.1198/016214504000000692
[37] Efron, B. and Morris, C. (1971). Limiting the risk of Bayes and empirical Bayes estimators. I. The Bayes case. J. Amer. Statist. Assoc. 66 807-815. · Zbl 0229.62003 · doi:10.2307/2284231
[38] Efron, B. and Morris, C. (1972a). Limiting the risk of Bayes and empirical Bayes estimators. II. The empirical Bayes case. J. Amer. Statist. Assoc. 67 130-139. · Zbl 0231.62013 · doi:10.2307/2284711
[39] Efron, B. and Morris, C. (1972b). Empirical Bayes on vector observations: An extension of Stein’s method. Biometrika 59 335-347. · Zbl 0238.62072 · doi:10.1093/biomet/59.2.335
[40] Efron, B. and Morris, C. (1973a). Stein’s estimation rule and its competitors-an empirical Bayes approach. J. Amer. Statist. Assoc. 68 117-130. · Zbl 0275.62005 · doi:10.2307/2284155
[41] Efron, B. and Morris, C. (1973b). Combining possibly related estimation problems (with discussion). J. Roy. Statist. Soc. Ser. B 35 379-421. · Zbl 0281.62030
[42] Efron, B. and Morris, C. (1975). Data analysis using Stein’s estimator and its generalization. J. Amer. Statist. Assoc. 70 311-319. · Zbl 0319.62039 · doi:10.2307/2285453
[43] Efron, B. and Morris, C. (1977). Stein’s paradox in statistics. Scientific American 236 119-127.
[44] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[45] Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In International Congress of Mathematicians III 595-622. Eur. Math. Soc., Zürich. · Zbl 1117.62137
[46] Friedman, J. (1994). An overview of predictive learning and function approximation. In From Statistics to Neural Networks (V. Cherkassky, J. Friedman and H. Wechsler, eds.). NATO ISI Series F 136 . Springer, New York. · Zbl 0809.00025
[47] Funk, S. (2006/2007). See Webb, B. (2006/2007).
[48] Gorrell, G. and Webb, B. (2006). Generalized Hebbian algorithm for incremental latent semantic analysis. Technical report, Linköping Univ., Sweden.
[49] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971-988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846
[50] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning , 2nd ed. Springer, New York. · Zbl 1273.62005 · doi:10.1007/978-0-387-84858-7
[51] Herlocker, J. L., Konstan, J. A., Borchers, A. and Riedl, J. (1999). An algorithmic framework for performing collaborative filtering. In Proc. 22 nd ACM SIGIR Conf. on Information Retrieval 230-237.
[52] Herlocker, J. L., Konstan, J. A. and Riedl, J. T. (2000). Explaining collaborative filtering recommendations. In Proc. 2000 ACM Conf. on Computer Supported Cooperative Work 241-250. ACM, New York.
[53] Herlocker, J. L., Konstan, J. A., Terveen, L. G. and Riedl, J. T. (2004). Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems 22 5-53.
[54] Hertz, J., Krogh, A. and Palmer, R. G. (1991). Introduction to the Theory of Neural Computation . Addison-Wesley, Redwood City, CA.
[55] Hill, W., Stead, L., Rosenstein, M. and Furnas, G. (1995). Recommending and evaluating choices in a virtual community of use. In Proc. SIGCHI Conf. on Human Factors in Computing Systems 194-201. ACM, New York.
[56] Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Comput. 14 1771-1800. · Zbl 1010.68111 · doi:10.1162/089976602760128018
[57] Hofmann, T. (2001a). Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. J 42 177-196. · Zbl 0970.68130 · doi:10.1023/A:1007617005950
[58] Hofmann, T. (2001b). Learning what people (don’t) want. In Proc. European Conf. on Machine Learning. Lect. Notes Comput. Sci. Eng. 2167 214-225. Springer, Berlin. · Zbl 1007.68548
[59] Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Transactions on Information Systems 22 89-115.
[60] Hofmann, T. and Puzicha, J. (1999). Latent class models for collaborative filtering. In Proc. Int. Joint Conf. on Artificial Intelligence 2 688-693. Morgan Kaufmann, San Francisco, CA.
[61] Hu, Y., Koren, Y. and Volinsky, C. (2008). Collaborative filtering for implicit feedback datasets. Technical report, AT&T Labs-Research, Florham Park, NJ.
[62] Izenman, A. J. (2008). Modern Multivariate Statistical Techniques : Regression , Classification , and Manifold Learning . Springer, New York. · Zbl 1155.62040 · doi:10.1007/978-0-387-78189-1
[63] James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. 4 th Berkeley Sympos. Math. Statist. Probab. I 361-379. Univ. California Press, Berkeley, CA. · Zbl 1281.62026
[64] Kim, D. and Yum, B. (2005). Collaborative filtering based on iterative principal component analysis. Expert Systems with Applications 28 823-830.
[65] Koren, Y. (2008). Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proc. 14 th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining 426-434. ACM, New York.
[66] Koren, Y. (2009). Collaborative filtering with temporal dynamics. In Proc. 15 th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining 447-456. ACM, New York.
[67] Koren, Y. (2010). Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Transactions on Knowledge Discovery from Data 4 Article 1.
[68] Koren, Y., Bell, R. and Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer 42 (8) 30-37.
[69] Li, K.-C. (1985). From Stein’s unbiased risk estimates to the method of generalized cross validation. Ann. Statist. 13 1352-1377. · Zbl 0605.62047 · doi:10.1214/aos/1176349742
[70] Lim, Y. J. and Teh, Y. W. (2007). Variational Bayesian approach to movie rating predictions. In Proc. KDD Cup and Workshop 2007 15-21. ACM, New York.
[71] Little, R. J. A. and Rubin, D. B. (1987). Statistical Analysis with Missing Data . Wiley, New York. · Zbl 0665.62004
[72] Mallows, C. (1973). Some comments on \(\mathrm{C}_{p}\). Technometrics 15 661-675. · Zbl 0269.62061 · doi:10.2307/1267380
[73] Maritz, J. S. and Lwin, T. (1989). Empirical Bayes Methods , 2nd ed. Monogr. Statist. Appl. Probab. 35 . Chapman & Hall, London. · Zbl 0731.62040
[74] Marlin, B. (2004). Collaborative filtering: A machine learning perspective. M.Sc. thesis, Computer Science Dept., Univ. Toronto.
[75] Marlin, B. and Zemel, R. S. (2004). The multiple multiplicative factor model for collaborative filtering. In Proc. 21 st Int. Conf. on Machine Learning . ACM, New York.
[76] Marlin, B., Zemel, R. S., Roweis, S. and Slaney, M. (2007). Collaborative filtering and the missing at random assumption. In Proc. 23 rd Conf. on Uncertainty in Artificial Intelligence . AMC, New York.
[77] Moguerza, J. M. and Muñoz, A. (2006). Support vector machines with applications. Statist. Sci. 21 322-336. · Zbl 1246.68185 · doi:10.1214/088342306000000493
[78] Moody, J. E. (1992). The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems. In Advances in Neural Information Processing Systems 4 . Morgan Kaufmann, San Francisco, CA.
[79] Morris, C. N. (1983). Parametric empirical Bayes inference: Theory and applications (with discussion). J. Amer. Statist. Assoc. 78 47-65. · Zbl 0506.62005 · doi:10.2307/2287098
[80] Narayanan, A. and Shmatikov, V. (2008). Robust de-anonymization of large datasets (How to break anonymity of the Netflix Prize dataset).
[81] Neal, R. M. and Hinton, G. E. (1998). A view of the EM algorithm that justifies incremental, sparse and other variants. In Learning in Graphical Models (M. I. Jordan, ed.) 355-368. Kluwer. · Zbl 0916.62019
[82] Netflix Inc. (2006/2010). Netflix Prize webpage: . Netflix Prize Leaderboard: http://www.netflixprize.com/leaderboard/ . Netflix Prize Forum: www.netflixprize.com/community/ .
[83] Oard, D. and Kim, J. (1998). Implicit feedback for recommender systems. In Proc. AAAI Workshop on Recommender Systems 31-36. AAAI, Menlo Park, CA.
[84] Park, S. T. and Pennock, D. M. (2007). Applying collaborative filtering techniques to movie search for better ranking and browsing. In Proc. 13 th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining 550-559. ACM, New York.
[85] Paterek, A. (2007). Improving regularized singular value decomposition for collaborative filtering. In Proc. KDD Cup and Workshop 2007 39-42. ACM, New York.
[86] Piatetsky, G. (2007). Interview with Simon Funk. SIGKDD Explorations Newsletter 9 38-40.
[87] Popescul, A., Ungar, L., Pennock, D. and Lawrence, S. (2001). Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In Proc. 17 th Conf. on Uncertainty Artificial Intelligence . Morgan Kaufmann, San Francisco, CA. 437-444.
[88] Pu, P., Bridge, D. G., Mobasher, B. and Ricci, F. (2008). In Proc. ACM Conf. on Recommender Systems 2008.
[89] Raiko, T., Ilin, A. and Karhunen, J. (2007). Principal component analysis for large scale problems with lots of missing values. In ECML 2007. Lecture Notes in Artificiant Intelligence 4701 (J. N. Kok et al. eds.) 691-698. Springer, Berlin.
[90] Rennie, J. D. M. and Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. In Proc. 22 nd Int. Conf. on Machine Learning 713-719. ACM, New York.
[91] Resnick, P. and Varian, H. R. (1997). Recommender systems. Communications of the ACM 40 56-58.
[92] Resnick, P., Iacocou, N., Suchak, M., Berstrom, P. and Riedl, J. (1994). Grouplens: An open architecture for collaborative filtering of netnews. In Proc. ACM Conf. on Computer Support Cooperative Work 175-186.
[93] Ripley, B. D. (1996). Pattern Recognition and Neural Networks . Cambridge Univ. Press, Cambridge. · Zbl 0853.62046
[94] Robbins, H. (1956). An empirical Bayes approach to statistics. In Proc. 3 rd Berkeley Sympos. Math. Statist. Probab. I 157-163. Univ. California Press, Berkeley. · Zbl 0074.35302
[95] Robbins, H. (1964). The empirical Bayes approach to statistical decision problems. Ann. Math. Statist. 35 1-20. · Zbl 0138.12304 · doi:10.1214/aoms/1177703729
[96] Robbins, H. (1983). Some thoughts on empirical Bayes estimation. Ann. Statist. 11 713-723. · Zbl 0522.62024 · doi:10.1214/aos/1176346239
[97] Roweis, S. (1997). EM algorithms for PCA and SPCA. In Advances in Neural Information Processing Systems 10 626-632. MIT Press, Cambridge, MA.
[98] Salakhutdinov, R. and Mnih, A. (2008a). Probabilistic matrix factorization. In Advances in Neural Information Processing Systems 20 1257-1264. MIT Press, Cambridge, MA.
[99] Salakhutdinov, R. and Mnih, A. (2008b). Bayesian probabilistic matrix factorization using MCMC. In Proc. 25 th Int. Conf. on Machine Learning .
[100] Salakhutdinov, R., Mnih, A. and Hinton, G. (2007). Restricted Boltzmann machines for collaborative filtering. In Proc. 24 th Int. Conf. on Machine Learning. ACM Inetrnational Conference Proceeding Series 227 791-798. ACM, New York.
[101] Sali, S. (2008). Movie rating prediction using singular value decomposition. Technical report, Univ. California, Santa Cruz.
[102] Sarwar, B., Karypis, G., Konstan, J. and Riedl, J. T. (2000). Application of dimensionality reduction in recommender system-a case study. In Proc. ACM WebKDD Workshop . ACM, New York.
[103] Sarwar, B., Karypis, G., Konstan, J. and Riedl, J. T. (2001). Item-based collaborative filtering recommendation algorithms. In Proc. 10 th Int. Conf. on the World Wide Web 285-295. ACM, New York.
[104] Srebro, N. and Jaakkola, T. (2003). Weighted low-rank approximations. In Proc. Twentieth Int. Conf. on Machine Learning (T. Fawcett and N. Mishra, eds.) 720-727. ACM, New York.
[105] Srebro, N., Rennie, J. D. M. and Jaakkola, T. S. (2005). Maximum-margin matrix factorization. In Advances in Neural Information Processing Systems 17 1329-1336.
[106] Stein, C. (1974). Estimation of the mean of a multivariate normal distribution. In Proceedings of the Prague Symposium on Asymptotic Statistics ( Charles Univ. , Prague , 1973) II 345-381. Charles Univ., Prague.
[107] Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135-1151. · Zbl 0476.62035 · doi:10.1214/aos/1176345632
[108] Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussion). J. Roy. Statist. Soc. Ser. B 36 111-147. · Zbl 0308.62063
[109] Takacs, G., Pilaszy, I., Nemeth, B. and Tikk, D. (2007). On the Gravity recommendation system. In Proc. KDD Cup and Workshop 2007 22-30. ACM, New York.
[110] Takacs, G., Pilaszy, I., Nemeth, B. and Tikk, D. (2008a). Major components of the Gravity recommendation system. SIGKDD Explorations 9 80-83.
[111] Takacs, G., Pilaszy, I., Nemeth, B. and Tikk, D. (2008b). Investigation of various matrix factorization methods for large recommender systems. In Proc. 2 nd Netflix-KDD Workshop . ACM, New York.
[112] Takacs, G., Pilaszy, I., Nemeth, B. and Tikk, D. (2008c). Matrix factorization and neighbor based algorithms for the Netflix Prize problem. In Proc. ACM Conf. on Recommender Systems 267-274. ACM, New York.
[113] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[114] Tintarev, N. and Masthoff, J. (2007). A survey of explanations in recommender systems. In Proc. 23 rd Int. Conf. on Data Engineering Workshops 801-810. IEEE, New York.
[115] Toscher, A. and Jahrer, M. (2008). The BigChaos solution to the Netflix Prize 2008. Technical report, commendo research and consulting, Köflach, Austria.
[116] Toscher, A., Jahrer, M. and Bell, R. M. (2009). The BigChaos solution to the Netflix Grand Prize. Technical report, commendo research and consulting, Koflach, Austria.
[117] Toscher, A., Jahrer, M. and Legenstein, R. (2008). Improved neighbourhood-based algorithms for large-scale recommender systems. In Proc. 2 nd Netflix-KDD Workshop 2008. ACM, New York.
[118] Toscher, A., Jahrer, M. and Legenstein, R. (2010). Combining predictions for accurate recommender systems. In Proc. 16 th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining 693-701. ACM, Washington, DC.
[119] Tuzhilin, A., Koren, Y., Bennett, C., Elkan, C. and Lemire, D. (2008). Proc. 2 nd KDD Workshop on Large Scale Recommender Systems and the Netflix Prize Competition . ACM, New York.
[120] Ungar, L. and Foster, D. (1998). Clustering methods for collaborative filtering. In Proc. Workshop on Recommendation Systems . AAAI Press, Menlo Park.
[121] van Houwelingen, J. C. (2001). Shrinkage and penalized likelihood as methods to improve predictive accuracy. Statist. Neerlandica 55 17-34. · Zbl 1075.62591 · doi:10.1111/1467-9574.00154
[122] Vapnik, V. N. (2000). The Nature of Statistical Learning Theory , 2nd ed. Springer, New York. · Zbl 0934.62009
[123] Wang, J., de Vries, A. P. and Reinders, M. J. T. (2006). Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In Proc. 29 th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval 501-508. ACM, New York.
[124] Webb, B. (aka Funk, S.) (2006/2007). ‘Blog’ entries, 27 October 2006, 2 November 2006, 11 December 2007 and 17 August 2007. Available at .
[125] Wu, M. (2007). Collaborative filtering via ensembles of matrix factorizations. In Proc. KDD Cup and Workshop 2007 43-47. ACM, New York.
[126] Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93 120-131. · Zbl 0920.62056 · doi:10.2307/2669609
[127] Yuan, M. and Lin, Y. (2005). Efficient empirical Bayes variable selection and estimation in linear models. J. Amer. Statist. Assoc. 100 1215-1225. · Zbl 1117.62453 · doi:10.1198/016214505000000367
[128] Zhang, Y. and Koren, J. (2007). Efficient Bayesian hierarchical user modeling for recommendation systems. In Proc. 30 th Int. ACM SIGIR Conf. on Research and Developments in Information Retrieval . ACM, New York.
[129] Zhou, Y., Wilkinson, D., Schreiber, R. and Pan, R. (2008). Large scale parallel collaborative filtering for the Netlix Prize. In Proc. 4 th Int. Conf. Algorithmic Aspects in Information and Management. Lecture Notes in Comput. Sci. 5031 337-348. Springer, Berlin.
[130] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265-286. · doi:10.1198/106186006X113430
[131] Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist. 35 2173-2192. · Zbl 1126.62061 · doi:10.1214/009053607000000127
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.