Comparative Study of K-Nearest Neighbour and Naïve Bayes Performances on Malay Text Classification

Nazratul Naziah Mohd  Muhait; Rosmayati  Mohemad; Noor Maizura  Mohamad Noor; Zulaiha  Ali Othman

doi:10.31098/cset.v1i2.474

Authors

Nazratul Naziah Mohd Muhait
Rosmayati Mohemad Universiti Kebangsaan Malaysia
Noor Maizura Mohamad Noor Universiti Malaysia Terengganu
Zulaiha Ali Othman Universiti Malaysia Terengganu

DOI:

https://doi.org/10.31098/cset.v1i2.474

Keywords:

Classification, Crime, K-Nearest Neighbour, Naïve Bayes, Malay Document

Abstract

Police narrative reports are critical in assisting the investigation officer in uncovering hidden information during the criminal investigation process. In recent years, detecting criminal linkages by locating modus operandi in a massive volume of unstructured police reports has become a significant challenge. Here have been few studies on text classification in the Malay language due to some limitations that need to be addressed. Text classification is the process of properly categorizing text into a set of categories. In this study, classification techniques are used to predict the class of modus operandi for housebreaking crime documents using a Malay crime dataset. The dataset used in this study for housebreaking crime is a real dataset from the Royal Police Department of Malaysia. The purpose of this paper is to compare the accuracy of the K-Nearest Neighbour (KNN) and Naive Bayes algorithms for classifying Malay Crime Reports based on their mode of operation. The experiment results show that Naïve Bayes achieved a high accuracy rate of 97.86% with a 9 second execution time, whereas KNN achieved an accuracy rate of 88.43% with a 48 second execution time.

Comparative Study of K-Nearest Neighbour and Naïve Bayes Performances on Malay Text Classification

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Make a Submission