The Comparison of Tree-Based Ensemble Machine Learning for Classifying Public Datasets

Nur Heri  Cahyana; Yuli  Fauziah; Agus Sasmito  Aribowo

doi:10.31098/cset.v1i1.412

Authors

Nur Heri Cahyana Informatics Department, Universitas Pembangunan Nasional “Veteran” Yogyakarta, Indonesia
Yuli Fauziah Informatics Department, Universitas Pembangunan Nasional Veteran Yogyakarta, Indonesia
Agus Sasmito Aribowo Informatics Department, Universitas Pembangunan Nasional Veteran Yogyakarta, Indonesia

DOI:

https://doi.org/10.31098/cset.v1i1.412

Abstract

This study aims to determine the best methods of tree-based ensemble machine learning to classify the datasets used, a total of 34 datasets. This study also wants to know the relationship between the number of records and columns of the test dataset with the number of estimators (trees) for each ensemble model, namely Random Forest, Extra Tree Classifier, AdaBoost, and Gradient Bosting. The four methods will be compared to the maximum accuracy and the number of estimators when tested to classify the test dataset. Based on the results of the experiments above, tree-based ensemble machine learning methods have been obtained and the best number of estimators for the classification of each dataset used in the study. The Extra Tree method is the best classifier method for binary-class and multi-class. Random Forest is good for multi-classes, and AdaBoost is a pretty good method for binary-classes. The number of rows, columns and data classes is positively correlated with the number of estimators. This means that to process a dataset with a large row, column or class size requires more estimators than processing a dataset with a small row, column or class size. However, the relationship between the number of classes and accuracy is negatively correlated, meaning that the accuracy will decrease if there are more classes for classification.

The Comparison of Tree-Based Ensemble Machine Learning for Classifying Public Datasets

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Make a Submission