×

T3C: improving a decision tree classification algorithm’s interval splits on continuous attributes. (English) Zbl 1414.68081

Summary: This paper proposes, describes and evaluates T3C, a classification algorithm that builds decision trees of depth at most three, and results in high accuracy whilst keeping the size of the tree reasonably small. T3C is an improvement over algorithm T3 in the way it performs splits on continuous attributes. When run against publicly available data sets, T3C achieved lower generalisation error than T3 and the popular C4.5, and competitive results compared to Random Forest and Rotation Forest.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Aba DW, Breslow LA (1998) Comparing simplification procedures for decision trees on an economics classification. Technical report, DTIC Document
[2] Auer P, Holte RC, Maass W (1995) Theory and applications of agnostic PAC-learning with small decision trees. In: Theory and applications of agnostic PAC-learning with small decision trees. Morgan Kaufmann, San Francisco, pp 21-29
[3] Berry MJ, Linoff GS (2004) Data mining techniques: for marketing, sales, and customer relationship management. Wiley, New York
[4] Breiman, L., Random forests, Mach Learn, 45, 5-32, (2001) · Zbl 1007.68152
[5] Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, Monterey · Zbl 0541.62042
[6] Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory. Springer, Berlin, pp 23-37 (1995)
[7] Gehrke J, Ramakrishnan R, Ganti V (1998) Rainforest-a framework for fast decision tree construction of large datasets. In: VLDB, vol 98, pp 416-427
[8] Han J, Kamber M, Pei J (2010) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco · Zbl 1230.68018
[9] Hubert, M.; Veeken, S., Robust classification for skewed data, Adv Data Anal Classif, 4, 239-254, (2010) · Zbl 1284.62378
[10] Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml
[11] Mozharovskyi P, Mosler K, Lange T (2015) Classifying real-world data with the DD\(α \)-procedure. Adv Data Anal Classif, 9(3):287-314
[12] Murthy S, Salzberg S (1995) Decision tree induction: how effective is the greedy heuristic? In: Proceedings of the first international conference on knowledge discovery and data mining. Morgan Kaufmann, San Francisco, pp 222-227
[13] Quinlan, JR, Induction of decision trees, Mach Learn, 1, 81-106, (1986)
[14] Quinlan JR (1993) C4.5: programs for machine learning, vol 1. Morgan Kaufmann, San Francisco
[15] Quinlan, JR, Improved use of continuous attributes in c4.5, J Artif Intell Res, 4, 77-90, (1996) · Zbl 0900.68112
[16] Rodriguez, J.; Kuncheva, L.; Alonso, C., Rotation forest: a new classifier ensemble method, IEEE Trans Pattern Anal Mach Intell, 28, 1619-1630, (2006)
[17] RuleQuest (2013). http://www.rulequest.com. Last Accessed April 2016
[18] Tatsis, VA; Tjortjis, C.; Tzirakis, P., Evaluating data mining algorithms using molecular dynamics trajectories, Int J Data Min Bioinform, 8, 169-187, (2013)
[19] Tjortjis, C.; Keane, JA, T3: an improved classification algorithm for data mining, Lect Notes Comput Sci, 2412, 50-55, (2002) · Zbl 1020.68916
[20] Tjortjis, C.; Saraee, M.; Theodoulidis, B.; Keane, JA, Using t3, an improved decision tree classifier, for mining stroke related medical data, Methods Inf Med, 46, 523-529, (2007)
[21] Witten I, Frank E, Hall M (2011) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.