×

Transformations for semi-continuous data. (English) Zbl 1452.62124

Summary: Semi-continuous data arise in many applications where naturally-continuous data become contaminated by the data generating mechanism. The resulting data contain several values that are “too frequent”, and in that sense are a hybrid between discrete and continuous data. The main problem is that standard statistical methods, which are geared towards continuous or discrete data, cannot be applied adequately to semi-continuous data. We propose a new set of two transformations for semi-continuous data that “iron out” the too-frequent values thereby transforming the data to completely continuous. We show that the transformed data maintain the properties of the original data, but are suitable for standard analysis. The transformations and their performance are illustrated using simulated data and real auction data from the online auction site eBay.

MSC:

62-08 Computational methods for problems pertaining to statistics
62G05 Nonparametric estimation
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bapna, R., Jank, W., Shmueli, G., 2008. Consumer surplus in online auctions. Information Systems Research (in press); Bapna, R., Jank, W., Shmueli, G., 2008. Consumer surplus in online auctions. Information Systems Research (in press)
[2] Bzik, T.J., 2005. Overcoming problems associated with the statistical reporting and analysis of ultratrace data. http://www.micromanagemagazine.com/archive/05/06/bzik.html; Bzik, T.J., 2005. Overcoming problems associated with the statistical reporting and analysis of ultratrace data. http://www.micromanagemagazine.com/archive/05/06/bzik.html
[3] Efromovich, S., Nonparametric Curve Estimation (1997), Springer-Verlag: Springer-Verlag New York
[4] Good, I. J.; Gaskins, R. J., Density estimation and bump hunting by the penalized maximum likelihood method exemplified by scattering and meteorite data, Journal of the American Statistical Association, 75, 369, 42-56 (1980) · Zbl 0432.62024
[5] Lambert, D., Zero-inflated poisson regression, with an application to defects in manufacturing, Technometrics, 34, 1-14 (1992) · Zbl 0850.62756
[6] Perlich, C., Rosset, S., 2006. Quantile tress for marketing. In: Proceedings of Data Mining in Business Applications Workshop, International Conference on Knowledge and Data Mining, Philadelphia, PA; Perlich, C., Rosset, S., 2006. Quantile tress for marketing. In: Proceedings of Data Mining in Business Applications Workshop, International Conference on Knowledge and Data Mining, Philadelphia, PA
[7] Scott, D. W., On optimal and data-based histograms, Biometrika, 66, 605-610 (1979) · Zbl 0417.62031
[8] Scott, D. W., (Multivariate Density Estimation: Theory, Practice, and Visualization. Multivariate Density Estimation: Theory, Practice, and Visualization, Wiley Series in Probability and Statistics (1992)) · Zbl 0850.62006
[9] Scott, D. W., Parametric statistical modeling by minimum integrated square error, Technometrics, 43(, 3, 274-285 (2001)
[10] Shekhar, S.; Lu, C. T.; Zhang, P., Unified approach to spatial outliers detection, GeoInformatica, 7, 2, 139-166 (2003)
[11] Wand, M. P., Data-based choice of histogram bin width, The American Statistician, 51, 1, 59-64 (1997)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.