K-means clustering as an imputation strategy for missing values in scholarship candidate data

Muhammad Muhammad; Tole  Sutikno; Imam  Riadi

doi:10.35335/mantik.v8i4.5904

PDF

Published: Feb 28, 2025

DOI: https://doi.org/10.35335/mantik.v8i4.5904

Keywords:

Imputation, K-Means, MAPE, Missing Values, Scholarship

Issue

Vol. 8 No. 4 (2025): February: Manajemen, Teknologi Informatika dan Komunikasi (Mantik)

Section

Computer Science

Statistics Article

Article View : 6 Times

Muhammad Muhammad

Universitas Ahmad Dahlan, Indonesia

Tole Sutikno

Universitas Ahmad Dahlan, Indonesia

Imam Riadi

Universitas Ahmad Dahlan, Indonesia

Abstract

The issue of missing values in the scholarship selection process poses a challenge that can impact decision-making. This study aims to perform data imputation for scholarship candidate datasets using the K-Means method and evaluate its performance using the Mean Absolute Percentage Error (MAPE). K-Means was selected for its ability to group data based on pattern similarities, enabling it to estimate missing values in the scholarship candidate dataset. Two datasets were utilized in this study: one with 10% missing data and another with 20%. The results indicate that K-Means imputation can effectively apply to scholarship candidate data. Additionally, the findings reveal that the proportion of missing data influences the optimal number of clusters required. For the dataset with 10% missing data, the best configuration was achieved with 5 clusters, resulting in a MAPE of 13%. Conversely, for the dataset with 20% missing data, the optimal configuration required 2 clusters, yielding a MAPE of 14%.

Downloads

Download data is not yet available.

How to Cite

Muhammad, M., Sutikno, T. . and Riadi, I. . (2025) “K-means clustering as an imputation strategy for missing values in scholarship candidate data”, Jurnal Mantik, 8(4), pp. 1656-1665. doi: 10.35335/mantik.v8i4.5904.

References

Bangun, B., & Karim, A. K. (2024). Pengembalian Data Yang Hilang Pada Dataset Dengan Menggunakan Algoritma K-Nearest Neighbor Imputation Data Mining. Jurnal Media Informatika Budidarma, 8(3), 1706. https://doi.org/10.30865/mib.v8i3.8014
Chhabra, G., Vashisht, V., & Ranjan, J. (2018). Missing Value Imputation using Hybrid K-Means and Association Rules. 2018 International Conference on Advances in Computing, Communication Control and Networking, 1163.
Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, 1–24. https://doi.org/10.7717/PEERJ-CS.623
Dalla, D. P., & Kewuel, H. K. (2023). Ketimpangan Akses Beasiswa dan Pengaruhnya Terhadap Keberlangsungan Studi Mahasiswa. Educare?: Jurnal Penelitian Pendidikan Dan Pembelajaran, 3(2), 52–59. https://doi.org/10.56393/educare.v3i2.1702
Darlinda, D., & Utamajaya, J. N. (2022). Sistem Pendukung Keputusan Penerima Beasiswa Program Indonesia Pintar Menggunakan Metode Algoritma K-Means Clustering. JURIKOM (Jurnal Riset Komputer), 9(2), 167. https://doi.org/10.30865/jurikom.v9i2.3971
de Myttenaere, A., Golden, B., Le Grand, B., & Rossi, F. (2016). Mean Absolute Percentage Error for regression models. Neurocomputing, 192, 38–48. https://doi.org/10.1016/j.neucom.2015.12.114
Fadlil, A., Herman, & Dikky Praseptian, M. (2023). Single Imputation Using Statistics-Based and K Nearest Neighbor Methods for Numerical Datasets. Ingenierie Des Systemes d’Information, 28(2), 451–459. https://doi.org/10.18280/isi.280221
Fatmawaty, V. S., Riadi, I., & Herman, H. (2024). Klasterisasi Perguruan Tinggi LLDIKTI V Berdasarkan Indikator Kinerja Utama dan PDDIKTI Menggunakan K-Means Clustering. Jurnal Media Informatika Budidarma, 8(2), 878. https://doi.org/10.30865/mib.v8i2.7497
Goa Wea, A., & Adiwidjaja, I. (2018). Pengaruh Beasiswa Terhadap Motivasi Dan Prestasi Belajar Mahasiswa Universitas Tribhuwana Tunggadewi Malang. In JISIP (Vol. 7, Issue 1). www.publikasi.unitri.ac.id
Hutagalung, J., & Sonata, F. (2021). Penerapan Metode K-Means Untuk Menganalisis Minat Nasabah. Jurnal Media Informatika Budidarma, 5(3), 1187. https://doi.org/10.30865/mib.v5i3.3113
Kabir, G., Tesfamariam, S., Hemsing, J., & Sadiq, R. (2020). Handling incomplete and missing data in water network database using imputation methods. Sustainable and Resilient Infrastructure, 5(6), 365–377. https://doi.org/10.1080/23789689.2019.1600960
Khair, U., Fahmi, H., Hakim, S. Al, & Rahim, R. (2017). Forecasting Error Calculation with Mean Absolute Deviation and Mean Absolute Percentage Error. Journal of Physics: Conference Series, 930(1). https://doi.org/10.1088/1742-6596/930/1/012002
Liantoni, F., & Agusti, A. (2020). Forecasting Bitcoin Using Double Exponential Smoothing Method Based on Mean Absolute Percentage Error. JOIV?: International Journal On Informatics Visualization , 4(2). www.cryptocompare.com.
Lin, W. C., & Tsai, C. F. (2020). Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, 53(2), 1487–1509. https://doi.org/10.1007/s10462-019-09709-4
Marcelino, C. G., Leite, G. M. C., Celes, P., & Pedreira, C. E. (2022). Missing Data Analysis in Regression. Applied Artificial Intelligence, 36(1), 2032925. https://doi.org/10.1080/08839514.2022.2032925
Miao, X., Wu, Y., Chen, L., Gao, Y., & Yin, J. (2023). An Experimental Survey of Missing Data Imputation Algorithms. IEEE Transactions on Knowledge and Data Engineering, 35(7), 6630–6650. https://doi.org/10.1109/TKDE.2022.3186498
Nasyuha, A. H., Zulham, & Rusydi, I. (2022). Implementation of K-means algorithm in data analysis. Telkomnika (Telecommunication Computing Electronics and Control), 20(2), 307–313. https://doi.org/10.12928/TELKOMNIKA.v20i2.21986
Praseptian M, D., Fadlil, A., & Herman, H. (2022). Penerapan Clustering K-Means untuk Pengelompokan Tingkat Kepuasan Pengguna Lulusan Perguruan Tinggi. JURNAL MEDIA INFORMATIKA BUDIDARMA, 6(3), 1693. https://doi.org/10.30865/mib.v6i3.4191
Privandhani, N. A., & Sulastri, S. (2022). Clustering Pop Songs Based On Spotify Data Using K-Means And K Medoids Algorithm. Jurnal Mantik, 6(2). https://doi.org/10.35335/mantik.v6i2.2517
Rahmayani, T. M. I., & Hidayati, N. (2022). Implemention K-Means Algorithm Determine the Recovery Rate Of Covid-19 Patients In Indonesia. Jurnal Mantik, 6(1), 127–135. https://doi.org/10.35335/jurnalmantik.v6i1.2059
Rangga Baihaqi, M., Padilah, T. N., & Jajuli, M. (2023). Implementasi Metode Imputasi Mean dan Single Center Imputation Chained Equation (SICE) Terhadap Hasil Prediksi Linear Regression pada Data Numerik. Jurnal Teknologi Informasi Dan Komunikasi), 7(4), 2023. https://doi.org/10.35870/jti
Riadi, A., & Prayudi, I. (2022). Cyberbullying Analysis on Instagram Using K-Means Clustering. JUITA?: Jurnal Informatika, 10(2), 261–271.
Rosmini, R., Fadlil, A., & Sunardi, S. (2018). Implementasi Metode K-Means Dalam Pemetaan Kelompok Mahasiswa Melalui Data Aktivitas Kuliah. IT Journal Research And Development, 3(1), 22–31. https://doi.org/10.25299/itjrd.2018.vol3(1).1773
Rustam, S. (2018). Analisa Clustering Phising Dengan K-Means Dalam Meningkatkan Keamanan Komputer. ILKOM Jurnal Ilmiah, 10(2), 175–181.
Ulandari, N. W. A. (2020). Implementasi Metode MOORA pada Proses Seleksi Beasiswa Bidikmisi di Institut Teknologi dan Bisnis STIKOM Bali. Jurnal Eksplora Informatika, 10(1), 53–58. https://doi.org/10.30864/eksplora.v10i1.379
Yulian Pamuji, F., Rofiqul Muslikh, A., Muhammad Arief, R., & Muti, D. (2024). Komparasi Metode Mean dan KNN Imputation Dalam Mengatasi Missing Value Pada Dataset Kecil. JIP (Jurnal Informatika Polinema) , 10(2). https://archive.ics.uci.edu/datasets.

Copyright and Licensing

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details