## Constrained linear regression models for symbolic interval-valued variables.(English)Zbl 1464.62055

Summary: This paper introduces an approach to fitting a constrained linear regression model to interval-valued data. Each example of the learning set is described by a feature vector for which each feature value is an interval. The new approach fits a constrained linear regression model on the midpoints and range of the interval values assumed by the variables in the learning set. The prediction of the lower and upper boundaries of the interval value of the dependent variable is accomplished from its midpoint and range, which are estimated from the fitted linear regression models applied to the midpoint and range of each interval value of the independent variables. This new method shows the importance of range information in prediction performance as well as the use of inequality constraints to ensure mathematical coherence between the predicted values of the lower ($$\hat y_{Li}$$) and upper ($$\hat y_{Ui}$$) boundaries of the interval. The authors also propose an expression for the goodness-of-fit measure denominated determination coefficient. The assessment of the proposed prediction method is based on the estimation of the average behavior of the root-mean-square error and square of the correlation coefficient in the framework of a Monte Carlo experiment with different data set configurations. Among other aspects, the synthetic data sets take into account the dependence, or lack thereof, between the midpoint and range of the intervals. The bias produced by the use of inequality constraints over the vector of parameters is also examined in terms of the mean-square error of the parameter estimates. Finally, the approaches proposed in this paper are applied to a real data set and performances are compared.

### MSC:

 62-08 Computational methods for problems pertaining to statistics 62J05 Linear regression; mixed models 62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text:

### References:

 [1] Bertrand, P., Goupil, F., Descriptive statistic for symbolic data. In: Bock, H.-H., Diday, E. (Eds.), Analysis of Symbolic Data. Springer, Heidelberg, pp. 106-124 · Zbl 0978.62005 [2] Bock, H.-H., Clustering algorithms and Kohonen maps for symbolic data, Journal of the Japanese society of computational statistics, 15, 1-13, (2002) [3] Billard, L.; Diday, E., Regression analysis for interval-valued data, (), 369-374 · Zbl 1026.62073 [4] Billard, L.; Diday, E., From the statistics of data to the statistics of knowledge: symbolic data analysis, Journal of American statistical association, 98, 462, 470-487, (2003) [5] Billard, L., Chouakria-Douzal, A., Diday, E., 2007. Symbolic Principal Components for Interval-Valued Observations. Technical Report. University of Georgia [6] Bock, H.H.; Diday, E., Analysis of symbolic data, () [7] Cazes, P.; Chouakria, A.; Diday, E.; Schektman, S., Extension de l’analyse en composantes principales des donnes de type intervalle, Revue de statistique aplique, XLV, 3, 5-24, (1997) [8] Chavent, M., A monothetic clustering method, Pattern recognition letters, 19, 989-996, (1998) · Zbl 0915.68148 [9] Chavent, M.; Lechevallier, Y., Dynamical clustering algorithm of interval data: optimization of an adequacy criterion based on Hausdorff distance, (), 53-59 · Zbl 1032.62058 [10] De Carvalho, F.A.T., Histograms in symbolic data analysis, Annals of operations research, 55, 229-322, (1995) · Zbl 0844.68111 [11] De Carvalho, F.A.T., Fuzzy c-means clustering methods for symbolic interval data, Pattern recognition letters, 28, 4, 423-437, (2007) [12] De Carvalho, F.A.T; Souza, R.M.C.R.; Chavent, M.; Lechevallier, Y., Adaptive Hausdorff distances and dynamic clustering of symbolic data, Pattern recognition letters, 27, 3, 167-179, (2006) [13] Draper, N.R.; Smith, H., Applied regression analysis, (1981), John Wiley New York · Zbl 0548.62046 [14] Dorfman, J.H.; McIntosh, C.S., Imposing inequality restrictions: efficiency gains from economic theory, Economics letters, 71, 205-209, (2001) · Zbl 0981.91061 [15] Gowda, K.C.; Diday, E., Symbolic clustering using a new dissimilarity measure, Pattern recognition, 24, 6, 567-578, (1991) [16] Gowda, K.C.; Diday, E., Symbolic clustering using a new similarity measure, IEEE transactions on systems, man and cybernetics, 22, 368-378, (1992) [17] Groenen, P.J.F.; Winsberg, S.; Rodrigues, O.; Diday, E., I-scal: multidimensional scaling of interval dissimilarities, Computational statistics and data analysis, 51, 1, 360-378, (2006) · Zbl 1157.62450 [18] Guru, D.S.; Kiranagi, B.B.; Nagabhushan, P., Multivalued type proximity measure and concept of mutual similarity value useful for clustering symbolic patterns, Pattern recognition letters, 25, 1203-1213, (2004) [19] Guru, D.S.; Kiranagi, B.B., Multivalued type dissimilarity measure and concept of mutual dissimilarity value for clustering symbolic patterns, Pattern recognition, 38, 151-256, (2005) · Zbl 1079.68605 [20] Ichino, M.; Yaguchi, H., Generalized Minkowski metrics for mixed feature type data analysis, IEEE transactions on systems, man and cybernetics, 24, 4, 698-708, (1994) · Zbl 1371.68235 [21] Ichino, M.; Yaguchi, H.; Diday, E., A fuzzy symbolic pattern classifier, (), 92-102 · Zbl 0896.68124 [22] Judge, G.G.; Takayama, T., Inequality restrictions in regression analysis, Journal of the American statistical association, 61, 166-181, (1966) · Zbl 0144.41702 [23] Lawson, C.L.; Hanson, R.J., Solving least squares problem, (1974), Prentice-Hall New York · Zbl 0860.65028 [24] Lauro, N.C.; Palumbo, F., Principal component analysis of interval data: A symbolic data analysis approach, Computational statistics, 15, 1, 73-87, (2000) · Zbl 0953.62058 [25] Lauro, N.C.; Verde, R.; Palumbo, F., Factorial discriminant analysis on symbolic objects, (), 212-233 · Zbl 0977.62070 [26] Liew, C.K., Inequality constrained least-squares estimation, Journal of the American statistical association, 71, 355, 746-751, (1976) · Zbl 0342.62037 [27] Lima Neto, E.A.; De Carvalho, F.A.T., Centre and range method to Fitting a linear regression model on symbolic interval data, Computational statistics and data analysis, 52, 1500-1515, (2008) · Zbl 1452.62493 [28] Lovell, M.C.; Prescott, E., Multiple regression with inequality constraints: pretesting bias, hypothesis testing and efficiency, Journal of the American statistical association, 65, 330, 913-925, (1970) [29] Maia, A.L.S.; De Carvalho, F.A.T.; Ludermir, T.B., Forecasting models for interval-valued time series, Neurocomputing, 71, 3344-3352, (2008) [30] Montgomery, D.C.; Peck, E.A., Introduction to linear regression analysis, (1982), John Wiley New York · Zbl 0587.62134 [31] Palumbo, F.; Verde, R., Non-symmetrical factorial discriminant analysis for symbolic objects, Applied stochastic models in business and industry, 15, 4, 419-427, (2000) · Zbl 0960.62062 [32] Périnel, E.; Lechevallier, Y., Symbolic discriminant rules, (), 244-265 · Zbl 0976.62061 [33] Rasson, J.P.; Lissoir, S., Symbolic kernel discriminant analysis, (), 240-244 · Zbl 0977.62072 [34] Scheffé, H., The analysis of variance, (1959), John Wiley New York · Zbl 0086.34603 [35] Souza, R.M.C.R.; De Carvalho, F.A.T., Clustering of interval data based on city-block distances, Pattern recognition letters, 25, 3, 353-365, (2004)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.