×

Subsemble: an ensemble method for combining subset-specific algorithm fits. (English) Zbl 1352.62020

Summary: Ensemble methods using the same underlying algorithm trained on different subsets of observations have recently received increased attention as practical prediction tools for massive data sets. We propose Subsemble: a general subset ensemble prediction method, which can be used for small, moderate, or large data sets. Subsemble partitions the full data set into subsets of observations, fits a specified underlying algorithm on each subset, and uses a clever form of \(V\)-fold cross-validation to output a prediction function that combines the subset-specific fits. We give an oracle result that provides a theoretical performance guarantee for Subsemble. Through simulations, we demonstrate that Subsemble can be a beneficial tool for small- to moderate-sized data sets, and often has better prediction performance than the underlying algorithm fit just once on the full data set. We also describe how to include Subsemble as a candidate in a SuperLearner library, providing a practical way to evaluate the performance of Subsemble relative to the underlying algorithm fit just once on the full data set.

MSC:

62-07 Data analysis (statistics) (MSC2010)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Bache K., UCI Machine Learning Repository (2013)
[2] DOI: 10.1007/BF00058655 · doi:10.1007/BF00058655
[3] Breiman L., Mach. Learn. 24 pp 49– (1996)
[4] Chu S., J. Stat. Educ 9 (2001)
[5] DOI: 10.1006/jcss.1997.1504 · Zbl 0880.68103 · doi:10.1006/jcss.1997.1504
[6] van der Laan M.J., Stat. Decis 24 pp 373– (2006)
[7] van der Laan M.J., Stat. Appl. Genet. Mol. Biol 6 (2007)
[8] DOI: 10.1145/2213836.2213958 · doi:10.1145/2213836.2213958
[9] DOI: 10.1016/S0893-6080(05)80023-1 · doi:10.1016/S0893-6080(05)80023-1
[10] Zhang Y., Annual Advances in Neural Information Processing Systems 26: Proceedings of the 2012 Conference
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.