Testing homogeneity of a large data set by bootstrapping.

*(English)*Zbl 1216.62190Summary: It is not rare to analyze large data sets these days. Large data is usually of census type which is called micro data in econometrics. The basic method of analysis is to estimate a single regression equation with common coefficients over the whole data. The same applies to other methods of estimation such as the discrete choice models, Tobit models, and so on. Heterogeneity in the data is usually adjusted by the dummy variables which represent socioeconomic differences among individuals in the sample. Including the coefficients of dummy variables, only one equation is estimated for the whole large sample, and it is usually not preferred to divide the whole sample into sub-samples. Data is said to be homogeneous in this paper if a single equation is fit to the whole data, and if it explains socioeconomic properties of the data well. We may estimate an equation in each sub-population if the whole population is divided into known sub-populations. Regression coefficients are different from one sub-population to another in this case, and the data is said to be heterogeneous in our paper. The analysis of variance is applied if sub-populations are known, and a sub-sample is collected from each sub-population. The sub-sample test statistics can be correlated with each other since the sub-samples can be overlapped. Critical values of the test statistics are calculated by simulations. An example follows.

##### MSC:

62P20 | Applications of statistics to economics |

62P25 | Applications of statistics to social sciences |

65C60 | Computational problems in statistics (MSC2010) |

PDF
BibTeX
XML
Cite

\textit{K. Morimune} and \textit{Y. Hoshino}, Math. Comput. Simul. 78, No. 2--3, 292--302 (2008; Zbl 1216.62190)

Full Text:
DOI

**OpenURL**

##### References:

[1] | Hausman, J., Specification tests in econometrics, Econometrica, 46, 1251-1271, (1978) · Zbl 0397.62043 |

[2] | Olsen, C.A., Comparison of parametric and semiparametric estimates of the effect of spousal health insurance coverage on weekly hours worked by wives, J. appl. econometrics, 13, 543-565, (1998) |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.