Comparative Study of K-Nearest Neighbour and Naïve Bayes Performances on Malay Text Classification

Authors

  • Nazratul Naziah Mohd Muhait
  • Rosmayati Mohemad Universiti Kebangsaan Malaysia
  • Noor Maizura Mohamad Noor Universiti Malaysia Terengganu
  • Zulaiha Ali Othman Universiti Malaysia Terengganu

DOI:

https://doi.org/10.31098/cset.v1i2.474

Keywords:

Classification, Crime, K-Nearest Neighbour, Naïve Bayes, Malay Document

Abstract

Police narrative reports are critical in assisting the investigation officer in uncovering hidden information during the criminal investigation process. In recent years, detecting criminal linkages by locating modus operandi in a massive volume of unstructured police reports has become a significant challenge. Here have been few studies on text classification in the Malay language due to some limitations that need to be addressed. Text classification is the process of properly categorizing text into a set of categories. In this study, classification techniques are used to predict the class of modus operandi for housebreaking crime documents using a Malay crime dataset. The dataset used in this study for housebreaking crime is a real dataset from the Royal Police Department of Malaysia. The purpose of this paper is to compare the accuracy of the K-Nearest Neighbour (KNN) and Naive Bayes algorithms for classifying Malay Crime Reports based on their mode of operation. The experiment results show that Naïve Bayes achieved a high accuracy rate of 97.86% with a 9 second execution time, whereas KNN achieved an accuracy rate of 88.43% with a 48 second execution time.

Downloads

Published

2021-12-20

How to Cite

Muhait , N. N. M. ., Mohemad, R. ., Mohamad Noor, N. M. ., & Ali Othman, Z. (2021). Comparative Study of K-Nearest Neighbour and Naïve Bayes Performances on Malay Text Classification. RSF Conference Series: Engineering and Technology, 1(2), 50–60. https://doi.org/10.31098/cset.v1i2.474