##
**Fuzzy rule based classification systems for big data with MapReduce: granularity analysis.**
*(English)*
Zbl 1414.68055

Summary: Due to the vast amount of information available nowadays, and the advantages related to the processing of this data, the topics of big data and data science have acquired a great importance in the current research. Big data applications are mainly about scalability, which can be achieved via the MapReduce programming model.

It is designed to divide the data into several chunks or groups that are processed in parallel, and whose result is “assembled” to provide a single solution. Among different classification paradigms adapted to this new framework, fuzzy rule based classification systems have shown interesting results with a MapReduce approach for big data. It is well known that the performance of these types of systems has a strong dependence on the selection of a good granularity level for the Data Base. However, in the context of MapReduce this parameter is even harder to determine as it can be also related with the number of Maps chosen for the processing stage. In this paper, we aim at analyzing the interrelation between the number of labels of the fuzzy variables and the scarcity of the data due to the data sampling in MapReduce. Specifically, we consider that as the partitioning of the initial instance set grows, the level of granularity necessary to achieve a good performance also becomes higher. The experimental results, carried out for several Big Data problems, and using the Chi-FRBCS-BigData algorithms, support our claims.

It is designed to divide the data into several chunks or groups that are processed in parallel, and whose result is “assembled” to provide a single solution. Among different classification paradigms adapted to this new framework, fuzzy rule based classification systems have shown interesting results with a MapReduce approach for big data. It is well known that the performance of these types of systems has a strong dependence on the selection of a good granularity level for the Data Base. However, in the context of MapReduce this parameter is even harder to determine as it can be also related with the number of Maps chosen for the processing stage. In this paper, we aim at analyzing the interrelation between the number of labels of the fuzzy variables and the scarcity of the data due to the data sampling in MapReduce. Specifically, we consider that as the partitioning of the initial instance set grows, the level of granularity necessary to achieve a good performance also becomes higher. The experimental results, carried out for several Big Data problems, and using the Chi-FRBCS-BigData algorithms, support our claims.

### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

68T10 | Pattern recognition, speech recognition |

68T37 | Reasoning under uncertainty in the context of artificial intelligence |

PDF
BibTeX
XML
Cite

\textit{A. Fernández} et al., Adv. Data Anal. Classif., ADAC 11, No. 4, 711--730 (2017; Zbl 1414.68055)

Full Text:
DOI

### References:

[1] | Chen, CP; Zhang, C-Y, Data-intensive applications, challenges, techniques and technologies: a survey on big data, Inf Sci, 275, 314-347, (2014) |

[2] | Chi Z, Yan H, Pham T (1996) Fuzzy algorithms with applications to image processing and pattern recognition. World Scientific, Singapore · Zbl 0942.68001 |

[3] | Cordón, O.; Herrera, F., A proposal for improving the accuracy of linguistic modeling, IEEE Trans Fuzzy Syst, 8, 335-344, (2000) |

[4] | Cordón, O.; Jesus, M.; Herrera, F., A proposal on reasoning methods in fuzzy rule-based classification systems, Int J Approx Reason, 20, 21-45, (1999) |

[5] | Cordón, O.; Herrera, F.; Villar, P., Analysis and guidelines to obtain a good fuzzy partition granularity for fuzzy rule-based systems using simulated annealing, Int J Approx Reason, 25, 187-215, (2000) · Zbl 0970.68575 |

[6] | Dean, J.; Ghemawat, S., MapReduce: simplified data processing on large clusters, Commun ACM, 51, 107-113, (2008) |

[7] | Dean, J.; Ghemawat, S., MapReduce: a flexible data processing tool, Commun ACM, 53, 72-77, (2010) |

[8] | Fernández, A.; Río, S.; López, V.; Bawakid, A.; Jesus, M.; Benítez, J.; Herrera, F., Big data with cloud computing: an insight on the computing environment, MapReduce and programming framework, WIREs Data Min Knowl Discov, 4, 380-409, (2014) |

[9] | Fernández, A.; Garcfa, S.; Luengo, J.; Bernadó-Mansilla, E.; Herrera, F., Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study, IEEE Trans Evolut Comput, 14, 913-941, (2010) |

[10] | Gacto, MJ; Alcalá, R.; Herrera, F., Interpretability of linguistic fuzzy rule-based systems: an overview of interpretability measures, Inf Sci, 181, 4340-4360, (2011) |

[11] | Hong, T-P; Lee, Y-C; Wu, M-T, An effective parallel approach for genetic-fuzzy data mining, Expert Syst Appl, 41, 655-662, (2014) |

[12] | Ishibuchi, H.; Mihara, S.; Nojima, Y., Parallel distributed hybrid fuzzy gbml models with rule set migration and training data rotation, IEEE Trans Fuzzy Syst, 21, 355-368, (2013) |

[13] | Ishibuchi, H.; Nakashima, T., Effect of rule weights in fuzzy rule-based classification systems, IEEE Trans Fuzzy Syst, 9, 506-515, (2001) |

[14] | Ishibuchi H, Nakashima T, Nii M (2004) Classification and modeling with linguistic information granules: advanced approaches to linguistic data mining. Springer, Berlin · Zbl 1060.68102 |

[15] | Ishibuchi, H.; Yamamoto, T., Rule weight specification in fuzzy rule-based classification systems, IEEE Trans Fuzzy Syst, 13, 428-435, (2005) |

[16] | Jackowski K, Krawczyk B, Wozniak M (2014) Improved adaptive splitting and selection: the hybrid training method of a classifier based on a feature space partitioning. Int J Neural Syst 24(3):1430007 |

[17] | Kambatla, K.; Kollias, G.; Kumar, V.; Grama, A., Trends in big data analytics, J Parallel Distrib Comput, 74, 2561-2573, (2014) |

[18] | Kraska, T., Finding the needle in the big data systems haystack, IEEE Internet Comput Mag, 17, 84-86, (2013) |

[19] | Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, pp 1-12. doi:10.1007/s13748-016-0094-0 (in press) |

[20] | Lam C (2011) Hadoop in action, 1st edn. Manning, Shelter Island |

[21] | Lichman M (2013) UCI machine learning repository; university of california, irvine, school of information and computer sciences. http://archive.ics.uci.edu/ml |

[22] | López, V.; Río, S.; Benítez, JM; Herrera, F., Cost-sensitive linguistic fuzzy rule based classification systems under the mapreduce framework for imbalanced big data, Fuzzy Sets Syst, 258, 5-38, (2015) |

[23] | Madden, S., From databases to big data, IEEE Internet Comput Mag, 16, 4-6, (2012) |

[24] | Marx, V., The big challenges of big data, Nature, 498, 255-260, (2013) |

[25] | Mattmann, CA, Computing: a vision for data science, Nature, 493, 473-475, (2013) |

[26] | O’Neil C, Schutt R (2013) Doing data science, 1st edn. O’Reilly Media, Sebastopol |

[27] | Provost, F.; Fawcett, T., Data science and its relationship to big data and data-driven decision making, Big Data, 1, 51-59, (2013) |

[28] | Provost F, Fawcett S (2013b) Data science for business. What you need to know about data mining and data-analytic thinking, 1st edn. O’Reilly Media, Sebastopol |

[29] | Río, S.; López, V.; Benítez, J.; Herrera, F., A MapReduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules, Int J Comput Intell Syst, 8, 422-437, (2015) |

[30] | Waller, M.; Fawcett, S., Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management, J Bus Logist, 34, 77-84, (2013) |

[31] | Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques. Morgan Kaufmann series in data management systems. Morgan Kaufmann, Burlington |

[32] | Wozniak, M.; Graña, M.; Corchado, E., A survey of multiple classifier systems as hybrid systems, Inf Fusion, 16, 3-17, (2014) |

[33] | Wozniak, M.; Krawczyk, B., Combined classifier based on feature space partitioning, Appl Math Comput Sci, 22, 855-866, (2012) |

[34] | Wu, X.; Zhu, X.; Wu, G-Q; Ding, W., Data mining with big data, IEEE Trans Knowl Data Eng, 26, 97-107, (2014) |

[35] | Zikopoulos PC, Eaton C, deRoos D, Deutsch T, Lapis G (2011) Understanding big data-analytics for enterprise class hadoop and streaming data, 1st edn. McGraw-Hill Osborne Media, East Windsor |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.