×

Clusterwise analysis for multiblock component methods. (English) Zbl 1414.62231

Summary: Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multiblock component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem – presented in this article – is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regression improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion – by means of a sequential algorithm – ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H25 Factor analysis and principal components; correspondence analysis
91C20 Clustering in the social and behavioral sciences

Software:

R
PDF BibTeX XML Cite
Full Text: DOI Link

References:

[1] Abdi, H.; Williams, L.; Reisfeld, B. (ed.); Mayeno, A. (ed.), Partial least squares methods: partial least squares correlation and partial least square regression, 549-579, (2012), New York
[2] Bock H (1969) The equivalence of two extremal problems and its application to the iterative classification of multivariate data. In: Vortragsausarbeitung, Tagung. Mathematisches Forschungsinstitut Oberwolfach
[3] Bougeard, S.; Cardinal, M., Multiblock modeling for complex preference study. Application to European preferences for smoked salmon, Food Qual Prefer, 32, 56-64, (2014)
[4] Bougeard, S.; Hanafi, M.; Qannari, E., ACPVI multibloc. Application à des données d’épidémiologie animale, Journal de la Société Française de Statistique, 148, 77-94, (2007)
[5] Bougeard, S.; Qannari, E.; Lupo, C.; Hanafi, M., From multiblock partial least squares to multiblock redundancy analysis. A continuum approach, Informatica, 22, 11-26, (2011) · Zbl 1263.62093
[6] Bougeard, S.; Qannari, E.; Rose, N., Multiblock redundancy analysis: interpretation tools and application in epidemiology, J Chemom, 25, 467-475, (2011)
[7] Bry, X.; Verron, T.; Redont, P.; Cazes, P., THEME-SEER: a multidimensional exploratory technique to analyze a structural model using an extended covariance criterion, J Chemom, 26, 158-169, (2012)
[8] Charles C (1977) Régression typologique et reconnaissance des formes. PhD thesis, University of Paris IX, France
[9] Roover, K.; Ceulemans, C.; Timmerman, M., Clusterwise simultaneous component analysis for analyzing structural differences in multivariate multiblock data, Psychol Methods, 17, 100-119, (2012)
[10] DeSarbo, W.; Cron, W., A maximum likelihood methodology for clusterwise linear regression, J Classif, 5, 249-282, (1988) · Zbl 0692.62052
[11] Diday E (1976) Classification et sélection de paramètres sous contraintes. Technical report, IRIA-LABORIA
[12] Dolce, P.; Esposito Vinzi, V.; Lauro, C.; Abdi, H. (ed.); Esposito Vinzi, V. (ed.); Russolillo, G. (ed.); Saporta, G. (ed.); Trinchera, L. (ed.), Path directions incoherence in PLS path modeling: a prediction-oriented solution, 59-59, (2016), Berlin · Zbl 1366.62133
[13] Hahn, C.; Johnson, M.; Hermann, AFA, Capturing customer heterogeneity using finite mixture PLS approach, Schmalenbach Bus Rev, 54, 243-269, (2002)
[14] Hubert, H.; Arabie, P., Comparing partitions, J Classif, 2, 193-218, (1985) · Zbl 0587.62128
[15] Hwang, H.; Takane, Y., Generalized structured component analysis, Psychometrika, 69, 81-99, (2004) · Zbl 1306.62437
[16] Hwang, H.; DeSarbo, S.; Takane, Y., Fuzzy clusterwise generalized structured component analysis, Psychometrika, 72, 181-198, (2007) · Zbl 1286.62107
[17] Kissita G (2003) Les analyses canoniques généralisées avec tableau de référence généralisé : éléments théoriques et appliqués. PhD thesis, University of Paris Dauphine, France
[18] Lohmoller J (1989) Latent variables path modeling with partial least squares. Physica-Verlag, Heidelberg · Zbl 0788.62050
[19] Martella, F.; Vicari, D.; Vichi, M., Partitioning predictors in multivariate regression models, Stat Comput, 25, 261-272, (2015) · Zbl 1331.62275
[20] Preda, C.; Saporta, G., Clusterwise PLS regression on a stochastic process, Comput Stat Data Anal, 49, 99-108, (2005) · Zbl 1429.62299
[21] Qin, S.; Valle, S.; Piovoso, M., On unifying multiblock analysis with application to decentralized process monitoring, J Chemom, 15, 715-742, (2001)
[22] Sarstedt, M., A review of recent approaches for capturing heterogeneity in partial least squares path modelling, J Model Manage, 3, 140-161, (2008)
[23] Schlittgen, R.; Ringle, C.; Sarstedt, M.; Becker, JM, Segmentation of PLS path models by iterative reweighted regressions, J Bus Res, 69, 4583-4592, (2016)
[24] Shao, Q.; Wu, Y., Consistent procedure for determining the number of clusters in regression clustering, J Stat Plan Inference, 135, 461-476, (2005) · Zbl 1074.62042
[25] Spath, H., Clusterwise linear regression, Computing, 22, 367-373, (1979) · Zbl 0387.65028
[26] Team R (2015) R: a language and environment of statistical computing. http://cran.r-project.org/
[27] Tenenhaus, A.; Tenenhaus, M., Regularized generalized canonical correlation analysis, Psychometrika, 76, 257-284, (2011) · Zbl 1284.62753
[28] Tenenhaus M (1998) La régression PLS. Technip, Paris · Zbl 0923.62058
[29] Trinchera L (2007) Unobserved heterogeneity in structural equation models: a new approach to latent class detection in PLS path modeling. PhD thesis, University of Naples Federico II
[30] Vicari, D.; Vichi, M., Multivariate linear regression for heterogeneous data, J Appl Stat, 40, 1209-1230, (2013)
[31] Vinzi, V.; Lauro, C.; Amato, S.; Vichi, M. (ed.); Monari, P. (ed.); Mignani, S. (ed.); Montanari, A. (ed.), PLS typological regression, 133-140, (2005), Berlin · Zbl 1341.62207
[32] Vinzi V, Ringle C, Squillacciotti S, Trinchera L (2007) Capturing and treating unobserved heterogeneity by response based segmentation in PLS path modeling. a comparison of alternative methods by computational experiments. Technical reports, ESSEC Business School, https://www.academia.edu/168969/Capturing_and_Treating_Unobserved_Heterogeneity_by_Response_Based_Segmentation_in_PLS_Path_Modeling._A_Comparison_of_Alternative_Methods_by_Computational_Experiments
[33] Vinzi, V.; Trinchera, L.; Squillacciotti, S.; Tenenhaus, M., REBUS-PLS: a response-based procedure for detecting unit segments in pls path modeling, Appl Stochastic Models Bus Ind, 24, 439-458, (2009) · Zbl 1199.90018
[34] Vivien M (2002) Approches PLS linéaires et non-linéaires pour la modélisation de multi-tableaux : théorie et applications. PhD thesis, University of Montpellier 1, France
[35] Westerhuis, J.; Coenegracht, P., Multivariate modelling of the pharmaceutical two-step process of wet granulation and tableting with multiblock partial least squares, J Chemom, 11, 379-392, (1997)
[36] Westerhuis, J.; Smilde, A., Deflation in multiblock PLS, J Chemom, 15, 485-493, (2001)
[37] Westerhuis, J.; Kourti, T.; MacGregor, J., Analysis of multiblock and hierarchical PCA and PLS model, J Chemom, 12, 301-321, (1998)
[38] Wold, H.; Kotz, S. (ed.); Johnson, N. (ed.), Encyclopedia of statistical sciences, 581-591, (1985), New York
[39] Wold S (1984) Three PLS algorithms according to SW. Technical reports, Umea University, Sweden
[40] Wold S, Martens H, Wold H (1983) The multivariate calibration problem in chemistry solved by the PLS method. Matrix Pencils pp 286-293 · Zbl 0499.65065
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.