Text Classification (TC) or text categorization can be described as the act of assigning text documents to predefined classes or categories. The need for automatic text classification came from the large amount of electronic documents on the web. The classification accuracy is affected by the documents content and the classification technique being used. In this research, an automatic Support Vector Machine (SVM) and k-Nearest Neighbor (kNN) classifiers will be developed and compared in classifying 800 Arabic documents into four categories (sport, politics, religion, and economy). The experimental results are presented in terms of F1-measure, precision, and recall.
Text Classification, Machine Learning, Support Vector Machine, k-Nearest Neighbor
- S. Al-Saleem, â€œAutomated Arabic Text Categorization Using SVM and NBâ€, International Arab Journal of e-Technology, Vol. 2, No. 2, June 2011
- M. Abdelwadood, â€œSupport Vector Machines based Arabic Language Text Classification System: Feature Selection Comparative Studyâ€, 12th WSEAS Int. Conf. on APPLIED MATHEMATICS, Cairo, Egypt, p.p 29-31, 2007
- K. Al-Hindi, E. Al-Thwaib, â€œA Comparative Study of Machine Learning Techniques in Classifying Full-Text Arabic Documents versus Summarized Documentsâ€, World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741, Vol. 2, No. 7, p.p 126-129, 2013
- L. Khreisat, â€œArabic text classification using N-Gram frequency statistics, a comparative studyâ€, Proceedings of the international conference on data mining (DMIN2006), Las Vegas, USA, p.p 78-82, 2006
- M. El-Kourdi, A. Bensaid, and T. Rachidi â€œAutomatic Arabic documents categorization based on the NaÃ¯ve Bayes algorithmâ€, In proceedings of the workshop on computational approaches to Arabic script-based languages (COLING-2004), University of Geneva, Geneva, Switzerland, p.p 51-58, 2004
- R. Al-Shalabi, G. Kanaan, and M. Gharaibeh, â€œArabic text categorization using kNN algorithmâ€, Proceedings of the 4th international multiconference on computer science and information technology (CSIT 2006), volume 4, Amman, Jordan, 2006
- H. Zhang, D. Li, â€œNaÃ¯ve Bayes text classifierâ€, IEEE international conference on granular computing, p.p 708-711, 2007
- G. Dayal, â€œKnowledge based Neural Network for text classificationâ€, IEEE international conference on granular computing. dâ€™Analyse statistique des Donnees Textuelles, p.p 542-547, 2007
- A. Mesleh, â€œChi Square Feature Extraction Based SVMs Arabic Language Text Categorization Systemâ€, Journal of Computer Science (3:6), pp. 430-435, 2007
- V. Springer, V. Vapnik, â€œThe Nature of Statistical Learning Theoryâ€, chapter 5, New York, 1995
- T. Joachims, â€œTransductive Inference for Text Classification using Support Vector Machinesâ€, proceedings of the International Conference on Machine Learning (ICML), pp. 200-209, 1999
- T. Joachims, â€œText Categorization with Support Vector Machines: Learning with Many Relevant Featuresâ€, In Proceedings of the European Conference on Machine Learning (ECML), pp.173-142, Berlin, 1998
- C. Van, Rijsbergan, â€œInformation Retrievalâ€, Buttersmiths, 2nd Edition, 1979
- R. Al-Shalabi, G. Kanaan, and M. Gharaibeh â€œArabic Text Categorization Using KNN Algorithmâ€, The 4th International Multiconference on Computer and Information Technology, CSIT 2006, Amman, Jordan, 2006
- WEKA. Data Mining Software in Java: http://www.cs.waikato.ac.nz/ml/weka. last visit on May, 2014
- B. Al-Shargabi, W. Al-Romimah, and F. Olayah, â€œA Comparative Study for Arabic Text Classification Algorithms Based on Stop Words Eliminationâ€, In proceedings of the International Conference on Intelligent Semantic Web-Services and Applications, 2011
International Journal of Sciences is Open Access Journal.
This article is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.
Author(s) retain the copyrights of this article, though, publication rights are with Alkhaer Publications.