Article Details
Vol. 1 No. 1 (2026): Februari
Air Quality Prediction in DKI Jakarta Using Support Vector Machine: A Comprehensive Classification Approach
Purpose: This study aimed to develop a machine learning-based classification model for predicting the Air Pollution Standard Index (ISPU/AQI) categories in DKI Jakarta using the Support Vector Machine (SVM) algorithm. This study specifically addresses the challenge of accurate multiclass air quality classification under real-world urban conditions characterized by high pollution variability and imbalanced class distributions.
Research Methodology: This study employed a quantitative research design using secondary time-series data comprising 1,825 daily observations from the Satu Data Jakarta portal (satudata.jakarta.go.id), covering the period from February to November 2023. Six pollutant parameters, namely PM2.5, PM10, CO, SO?, NO?, and O?, served as predictor features. Data preprocessing included missing value imputation, duplicate removal, label encoding, and min-max normalization. The SVM classifier with a Radial Basis Function (RBF) kernel was implemented using Python’s Scikit-learn library on Google Colaboratory. Hyperparameter optimization was conducted via GridSearchCV with stratified K-fold cross-validation, and the model was evaluated using accuracy, precision, recall, F1-score, and confusion matrix analysis.
Results: The optimized SVM model achieved an overall classification accuracy of 96.1% on the held out test set. The macro-average F1-scores were 86%, 98 %, and 93% for the ‘Good, ’ ‘Moderate, ’ and ‘Unhealthy’ categories, respectively. Class imbalance was identified as a primary challenge, with minority classes (Very Unhealthy, Hazardous) underrepresented in the dataset, contributing to differential model performance across categories.
Conclusions: The SVM-RBF model demonstrated high predictive accuracy for air quality classification in urban tropical environments, confirming its applicability as a foundation for automated real-time air quality monitoring systems. The results establish a replicable methodological framework for similar studies in other Indonesian metropolitan regions.
Limitations: The dataset is geographically restricted to DKI Jakarta and temporally limited to a ten-month period, constraining generalizability. Class imbalance for extreme pollution categories (Very Unhealthy, Hazardous) limits the model reliability for rare but critical events.
Contributions: This study contributes a validated SVM-based classification pipeline for AQI prediction in a tropical megacity context, providing methodological guidance for environmental informatics researchers and urban policy practitioners in developing countries seeking scalable, data-driven air-quality management tools.
- Adawiyah, R., & Iskandar Mulyana, D. (2022). Optimasi deteksi penyakit kulit menggunakan metode Support Vector Machine (SVM) dan Gray Level Co-occurrence Matrix (GLCM). INFORMASI (Jurnal Informatika dan Sistem Informasi), 14(1), 18-33. https://doi.org/10.37424/informasi.v14i1. 138
- Alfian, F., Nurhidayat, M., & Hidayat, T. (2024). Analisis dampak pencemaran udara terhadap kesehatan di wilayah perkotaan. Jurnal Kesehatan Lingkungan, 16(1), 45-52. https://doi.org/10.59 966/semar.v2i3.885
- Arias, P. A., Bellouin, N., Coppola, E., Jones, R. G., Krinner, G., Marotzke, J., et al. (2021). Technical summary. In V. Masson-Delmotte et al. (Eds.), In V. Masson-Delmotte et al. (Eds.), Climate Change 2021: The Physical Science Basis (pp. 33–144). Cambridge University Press (pp. 35-144). Cambridge University Press.
- Chaloulakou, A., Grivas, G., & Spyrellis, N. (2003). Neural network and multiple regression models for PM10 prediction in Athens: A comparative assessment. Journal of the Air & Waste Management Association, 53(10), 1183-1190. https://doi.org/10.1080/10473289.2003.10466276
- Chandra, W., Suprihatin, B., & Resti, Y. (2023). Median-KNN Regressor-SMOTE-Tomek Links for handling missing and imbalanced data in air quality prediction. Symmetry, 15(4), 887. https://doi.o rg/10.3390/sym15040887
- Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Paper presented at In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM.
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.).
- Hasyim, S. H., Rahman, A., & Sutoyo, T. (2021). Klasifikasi kualitas udara menggunakan metode SVM dan Naïve Bayes. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 5(3), 511-518. https://doi.org/10.29207/resti.v5i3.3340
- IQAir. (2023). 2023 world air quality report: Region and city PM2.5 ranking. IQAir AG. Kementerian Lingkungan Hidup dan Kehutanan (KLHK). (2020). Peraturan Menteri Lingkungan Hidup dan Kehutanan No.
- Ketu, S., & Mishra, P. K. (2021). Scalable kernel-based SVM classification algorithm on imbalance air quality data for proficient healthcare. Complex & Intelligent Systems, 7(5), 2597-2615. https://d oi.org/10.1007/s40747-021-00435-5
- Kumar, P., Gulia, S., Harrison, R. M., & Khare, M. (2017). The influence of odd-even car trial on fine and coarse particles in Delhi. Environmental Pollution, 225, 20–29. https://doi.org/10.1016/j.envp ol.2016.12.037
- Li, X., Peng, L., Yao, X., Cui, S., Hu, Y., You, C., & Chi, T. (2017). Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation. Environmental Pollution, 231, 997–1004. https://doi.org/10.1016/j.envpol.2017.08.114
- Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 4765–4774. https://doi.org/10.48550/arXiv.1705.07 874
- Méndez, M., Merayo, M. G., & Núñez, M. (2023). Machine learning algorithms to forecast air quality: A survey. Artificial Intelligence Review, 56(9), 10031-10066. https://doi.org/10.1007/s104 62-023-10424-4
- Nguyen, H. T., Cai, W., & Bhatt, D. L. (2023). Machine learning applications in environmental health: A systematic review. Environmental Research, 220, 115213.
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. https://doi.org/10.48550/arXiv.1201.0490
- Prasetyo, D., & Nugroho, R. A. (2021). Prediksi kualitas udara menggunakan SVM dan decision tree. Jurnal Teknik ITS, 10(2). https://doi.org/10.12962/j23373539.v10i2.71234
- Putri, L. A., & Suwanda. (2023). Implementasi metode Artificial Neural Network (ANN) algoritma backpropagation untuk klasifikasi kualitas udara di Provinsi DKI Jakarta tahun 2021. Bandung Conference Series: Statistics, 3(2), 345-352. https://doi.org/10.29313/bcss.v3i2.7826
- Rahmawati, L., Siregar, M., & Zulkarnain, H. (2022). Sistem pemantauan kualitas udara berbasis data dan IoT. Jurnal Teknologi Informasi dan Komputer, 10(3), 210-218..
- Rosales-Pérez, A., García, S., & Herrera, F. (2022). Handling imbalanced classification problems withb support vector machines via evolutionary bilevel optimization. IEEE Transactions on Evolutionary Computation. Advance online publication. https://doi.org/10.48550/arXiv.2204.10231
- Saminathan, S., & Malathy, C. (2023). Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data. Frontiers in Big Data, 6, Article 1175259. htt ps://doi.org/10.3389/fdata.2023.1175259
- Sari, R. P., & Wibowo, A. (2023). Comparative study of machine learning algorithms for air quality index prediction in Indonesian cities. Journal of Environmental Informatics, 41(1), 55-67. https://d oi.org/10.3808/jei.202300477
- Sarkar, D., Singh, P., & Kumar, R. (2023). Air quality index prediction using machine learning for Ahmedabad city. Digital Chemical Engineering, 7, 100093. https://doi.org/10.1016/j.dche.2023.10 0093
- Suryani, E., Wulandari, A., & Putri, R. D. (2021). Prediksi kualitas udara menggunakan algoritma machine learning. Jurnal Informatika, 15(2), 155-162..
- Tsai, Y. T., Zeng, Y.-R., & Chang, Y.-S. (2018). Air pollution forecasting using RNN with LSTM. Paper presented at In 2018 IEEE 16th International Conference on Dependable, Autonomic and Secure Computing (pp. 1074–1079). IEEE.
- Utami, W. S., & Nurfikri, F. (2021). Penerapan Support Vector Machine (SVM) untuk klasifikasi data kualitas udara. JPIT, 6(2), 88-94. https://doi.org/10.30591/jpit.v6i2.2399
- Wen, C., Liu, S., Yao, X., Peng, L., Li, X., Hu, Y., & Chi, T. (2019). A novel spatiotemporal convolutional long short-term neural network for air pollution prediction. Science of the Total Environment, 654, 1091–1099.
- World Health Organization. (2018). Air pollution and child health: Prescribing clean air.
- World Health Organization. (2021). WHO global air quality guidelines: Particulate matter (PM2.5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide.
- Zhang, Y., Bocquet, M., Mallet, V., Seigneur, C., & Baklanov, A. (2012). Real-time air quality forecasting, part I: History, techniques, and current status. Atmospheric Environment, 60, 632–655. https://doi.org/10.1016/j.atmosenv.2012.06.031
- Zhu, M., Wang, J., Yang, X., Zhang, Y., Zhang, L., Ren, H., Wu, B., & Ye, L. (2022). A review of the application of machine learning in water quality evaluation. Eco-Environment and Health, 1(2), 107-116. https://doi.org/10.1016/j.eehl.2022.06.001