Pengembangan Keterampilan Associate Data Scientist melalui Pelatihan dengan RapidMiner

Published: Jun 11, 2025

Abstract:

Purpose: This study aims to evaluate the effectiveness of an online Associate Data Scientist training program that utilizes RapidMiner as the primary platform for teaching data science and machine learning. The goal is to assess participants' improvements in data preprocessing, algorithm application, and model evaluation skills.

Methodology/approach: The training program was conducted via Zoom and included interactive lectures, live demonstrations, hands-on exercises, and individual assignments. RapidMiner was used as the main tool throughout the sessions. Participants were evaluated through tasks assigned in each session and a final project that required them to analyze a dataset, apply relevant algorithms, and assess model performance.

Results/findings: The results showed significant improvement in participants’ technical understanding and application skills. The average final project score was 87.0, indicating strong competence in data handling, algorithm selection, and model evaluation. Most participants completed the project successfully, demonstrating their readiness to apply data science concepts in real-world scenarios.

Conclusions: The online training effectively bridged the gap between theory and practice, proving that remote learning can deliver quality outcomes in technical education. The combination of RapidMiner and a structured training format enabled participants to gain applicable skills in data science. However, improvements in instructional delivery and interaction are still needed to optimize learning experiences.

Limitations: Challenges included internet connectivity issues and limited real-time interaction, which sometimes hindered learning flow and instructor support.

Contribution: This study provides valuable insights into data science education, proving that online programs with practical tools like RapidMiner can successfully build core competencies in aspiring data professionals.

Keywords:
1. Data Analysis
2. Data Science Training
3. Machine Learning
4. Online Learning
5. Rapidminer
Authors:
1 . Egi Safitri
2 . Rini Nurlistiani
3 . Hendra Kurniawan
How to Cite
Safitri, E., Nurlistiani, R., & Kurniawan, H. (2025). Pengembangan Keterampilan Associate Data Scientist melalui Pelatihan dengan RapidMiner . Yumary: Jurnal Pengabdian Kepada Masyarakat, 5(4), 677–686. https://doi.org/10.35912/yumary.v5i4.3664

Downloads

Download data is not yet available.
Issue & Section
References

    Ali, M. G. (2022). A General Perspective about Institutional Rankings, Ranking Framework, Benefits of Rankings and Ranking Methodological Flaws. International Journal of Educational Research Review, 7(3), 157-164. doi:https://doi.org/10.24331/ijere.1067952

    Arias-Barahona, M. X., Arteaga-Arteaga, H. B., Orozco-Arias, S., Flórez-Ruíz, J. C., Valencia-Díaz, M. A., & Tabares-Soto, R. (2023). Requests Classification in the Customer Service Area for Software Companies Using Machine Learning and Natural Language Processing. PeerJ Computer Science, 9. doi:https://doi.org/10.7717/peerj-cs.1016

    Aryotejo, G., Hakim, M. M., Firmansah, F., & Safarizki, H. A. (2021). Pelatihan Efisiensi Sumber Daya Sistem Operasi Windows pada Masa Pandemi Covid 19. Jurnal ABDINUS: Jurnal Pengabdian Nusantara, 4(2), 238-246. doi:https://doi.org/10.29407/ja.v4i2.14906

    Cheng, M., Adekola, O., Albia, J., & Cai, S. (2022). Employability in Higher Education: A Review of Key Stakeholders' Perspectives. Higher Education Evaluation and Development, 16(1), 16-31. doi:https://doi.org/10.1108/HEED-03-2021-0025

    Frank, E., Hall, M. A., & Witten, I. H. (2016). The WEKA Workbench "Data Mining: Practical Machine Learning Tools and Techniques" 4th Edition. Burlington: Morgan Kaufmann Publishers.

    Gehlen, K. P.-v., Höck, H., Fast, A., Heydebreck, D., Lammert, A., & Thiemann, H. (2022). Recommendations for Discipline-Specific FAIRness Evaluation Derived from Applying an Ensemble of Evaluation Tools. Data Science Journal, 21(1), 1-21. doi:https://doi.org/10.5334/dsj-2022-007

    Grover, P., & Kar, A. K. (2017). Big Data Analytics: A Review on Theoretical Contributions and Tools Used in Literature. Global Journal of Flexible Systems Management, 18(3), 203-229. doi:https://doi.org/10.1007/s40171-017-0159-3

    Gul, R., & Al-Faryan, M. A. S. (2023). From Insights to Impact: Leveraging Data Analytics for Data-Driven Decision-Making and Productivity in Banking Sector. Humanities and Social Sciences Communications, 10(1), 1-8. doi:https://doi.org/10.1057/s41599-023-02122-x

    Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques 3rd Edition. Waltham: Morgan Kaufmann Publishers.

    Haris, Y., Friadi, J., Frederick, A. E. S., Huda, D. N., & Romdoni, M. R. (2024). Clustering Data Stok Penjualan Sparepart Mobil Toyota Bengkel Multi Topindo Menggunakan K-Means. Jurnal Ilmu Siber dan Teknologi Digital, 2(2), 109-121. doi:https://doi.org/10.35912/jisted.v2i2.3308

    Johnson, M., Jain, R., Brennan-Tonetta, P., Swartz, E., Silver, D., Paolini, J., . . . Hill, C. (2021). Impact of Big Data and Artificial Intelligence on Industry: Developing a Workforce Roadmap for a Data Driven Economy. Global Journal of Flexible Systems Management, 22(3), 197-217. doi:https://doi.org/10.1007/s40171-021-00272-y

    Kholifah, N., Gadi, A. C. Z., Yuli, S. E., Khayati, E. Z., & Triyanto, T. (2020). Penggunaan Zoom Cloud Meeting sebagai Alternatif Pembelajaran Jarak Jauh. Prosiding Pendidikan Teknik Boga Busana, 15(1).

    Lewis, A., & Stoyanovich, J. (2022). Teaching Responsible Data Science: Charting New Pedagogical Territory. International Journal of Artificial Intelligence in Education, 32(3), 1-25. doi:https://doi.org/10.1007/s40593-021-00241-7

    Mehmood, I., Shahid, S., Hussain, H., Khan, I., Ahmad, S., Rahman, S., . . . Huda, S. (2023). A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning. IEEE Access, 11, 63579-63597. doi:https://doi.org/10.1109/ACCESS.2023.3287326

    Mildenberger, T., Braschler, M., Ruckstuhl, A., Vorburger, R., & Stockinger, K. (2023). The Role of Data Scientists in Modern Enterprises-Experience from Data Science Education. ACM SIGMOD Record, 52(2), 48-52. doi:https://doi.org/10.1145/3615952.3615966

    Monino, J.-L. (2021). Data Value, Big Data Analytics, and Decision-Making. Journal of the Knowledge Economy, 12(1), 256-267.

    Natasuwarna, A., Pangihutan, H. B., & Ramadani, S. (2022). Workshop Online Penerapan Data Science pada Dunia Pendidikan. Seminar Nasional Penelitian dan Pengabdian Kepada Masyarakat CORISINDO, 186-189.

    Natasuwarna, A. P. (2019). Seminar Pendekatan Data Mining Memprediksi Profil Sosial Masyarakat Menggunakan Aplikasi RapidMiner. SNPMas: Seminar Nasional Pengabdian pada Masyarakat, 38-44.

    Nisya, I. S., Wulansari, O. D. E., & Wartariyus, W. (2023). Rancang Bangun Game Edukasi Bencana Alam Menggunakan Metode MDLC. Jurnal Ilmu Siber dan Teknologi Digital, 2(1), 23-44. doi:https://doi.org/10.35912/jisted.v2i1.2374

    Nottbrock, C., Looy, A. V., & Haes, S. D. (2023). Impact of Digital Industry 4.0 Innovations on Interorganizational Value Chains: A Systematic Literature Review. Business Process Management Journal, 29(1), 43-76. doi:https://doi.org/10.1108/BPMJ-06-2022-0259

    Putra, T. I. Z. M., Suprapto, S., & Bukhori, A. F. (2022). Model Klasifikasi Berbasis Multiclass Classification dengan Kombinasi Indobert Embedding dan Long Short-Term Memory untuk Tweet Berbahasa Indonesia. Jurnal Ilmu Siber dan Teknologi Digital, 1(1), 1-28. doi:https://doi.org/10.35912/jisted.v1i1.1509

    Putri, A. N., Wakhidah, N., & Utomo, V. G. (2022). Pemanfaatan Data Mining untuk Media Pembelajaran di SMK Hidayah Semarang. E-Dimas: Jurnal Pengabdian Kepada Masyarakat, 13(3), 487-491. doi:https://doi.org/10.26877/e-dimas.v13i3.5572

    Rajan, D., Beymer, D., Abedin, S., & Dehghan, E. (2020). Pi-PE: A Pipeline for Pulmonary Embolism Detection using Sparsely Annotated 3D CT Images. Proceedings of the Machine Learning for Health NeurIPS Workshop, 116, 220-232.

    Rengarajan, S., Narayanamurthy, G., Moser, R., & Pereira, V. (2022). Data Strategies for Global Value Chains: Hybridization of Small and Big Data in the Aftermath of COVID-19. Journal of Business Research, 144, 776-787. doi:https://doi.org/10.1016/j.jbusres.2022.02.042

    Sarwosri, Rochimah, S., Yuhana, U. L., Siahaan, D. O., & Akbar, R. J. (2024). Pelatihan Pemrograman Web Dasar untuk Siswa di SMA Negeri 1 Bojonegoro. Sewagati: Jurnal Pengabdian Kepada Masyarakat, 8(1), 1053-1060. doi:https://doi.org/10.12962/j26139960.v8i1.548

    Setiawan, E., Nurhatisyah, N., & Nanra, S. (2023). Pengontrolan Bahaya Kebakaran Berbasis IOT pada Ruang Server SMFR Balai Monitor Spektrum Frekuensi Radio Kelas II Batam. Jurnal Ilmu Siber dan Teknologi Digital, 1(1), 41-51. doi:https://doi.org/10.35912/jisted.v1i1.1800

    Shabbir, M. Q., & Gardezi, S. B. W. (2020). Application of Big Data Analytics and Organizational Performance: The Mediating Role of Knowledge Management Practices. Journal of Big Data, 7, 1-17. doi:https://doi.org/10.1186/s40537-020-00317-6

    Shaharabani, Y. F., & Yarden, A. (2019). Toward Narrowing the Theory–Practice Gap: Characterizing Evidence from in-Service Biology Teachers’ Questions Asked During an Academic Course. International Journal of STEM Education, 6, 1-13. doi:https://doi.org/10.1186/s40594-019-0174-3

    Slater, S., Joksimovi?, S., Kovanovic, V., Baker, R. S., & Gasevic, D. (2017). Tools for Educational Data Mining: A Review. Journal of Educational and Behavioral Statistics, 42(1), 85-106. doi:https://doi.org/10.3102/1076998616666808

    Suyudi, I., Sudadio, & Suherman. (2022). Pengenalan Bahasa Isyarat Indonesia Menggunakan Mediapipe dengan Model Random Forest dan Multinomial Logistic Regression. Jurnal Ilmu Siber dan Teknologi Digital, 1(1), 65-80. doi:https://doi.org/10.35912/jisted.v1i1.1899

    Taufik, R., Muhaqiqin, Ilman, I. S., & Sholehurrohman, R. (2023). Analisis Informasi Jaringan Homogen dan Heterogen pada Liga Champions UEFA. Jurnal Ilmu Siber dan Teknologi Digital, 1(2), 91-110. doi:https://doi.org/10.35912/jisted.v1i2.1928

    Tu, X., Zou, J., Su, W. J., & Zhang, L. (2023). What Should Data Science Education Do With Large Language Models?. Harvard Data Science Review, 6(1), 1-21. doi:https://doi.org/10.1162/99608f92.bff007ab

    Vyas, V., & Uma, V. (2018). An Extensive Study of Sentiment Analysis Tools and Binary Classification of Tweets Using Rapid Miner. Procedia Computer Science, 125, 329-335. doi:https://doi.org/10.1016/j.procs.2017.12.044

    Wang, D., Weisz, J. D., Muller, M., Ram, P., Geyer, W., Dugan, C., . . . Gray, A. (2019). Human-AI Collaboration in Data Science: Exploring Data Scientists' Perceptions of Automated AI. Proceedings of the ACM on Human-Computer Interaction, 3, 1-24. doi:https://doi.org/10.1145/3359313

  1. Ali, M. G. (2022). A General Perspective about Institutional Rankings, Ranking Framework, Benefits of Rankings and Ranking Methodological Flaws. International Journal of Educational Research Review, 7(3), 157-164. doi:https://doi.org/10.24331/ijere.1067952
  2. Arias-Barahona, M. X., Arteaga-Arteaga, H. B., Orozco-Arias, S., Flórez-Ruíz, J. C., Valencia-Díaz, M. A., & Tabares-Soto, R. (2023). Requests Classification in the Customer Service Area for Software Companies Using Machine Learning and Natural Language Processing. PeerJ Computer Science, 9. doi:https://doi.org/10.7717/peerj-cs.1016
  3. Aryotejo, G., Hakim, M. M., Firmansah, F., & Safarizki, H. A. (2021). Pelatihan Efisiensi Sumber Daya Sistem Operasi Windows pada Masa Pandemi Covid 19. Jurnal ABDINUS: Jurnal Pengabdian Nusantara, 4(2), 238-246. doi:https://doi.org/10.29407/ja.v4i2.14906
  4. Cheng, M., Adekola, O., Albia, J., & Cai, S. (2022). Employability in Higher Education: A Review of Key Stakeholders' Perspectives. Higher Education Evaluation and Development, 16(1), 16-31. doi:https://doi.org/10.1108/HEED-03-2021-0025
  5. Frank, E., Hall, M. A., & Witten, I. H. (2016). The WEKA Workbench "Data Mining: Practical Machine Learning Tools and Techniques" 4th Edition. Burlington: Morgan Kaufmann Publishers.
  6. Gehlen, K. P.-v., Höck, H., Fast, A., Heydebreck, D., Lammert, A., & Thiemann, H. (2022). Recommendations for Discipline-Specific FAIRness Evaluation Derived from Applying an Ensemble of Evaluation Tools. Data Science Journal, 21(1), 1-21. doi:https://doi.org/10.5334/dsj-2022-007
  7. Grover, P., & Kar, A. K. (2017). Big Data Analytics: A Review on Theoretical Contributions and Tools Used in Literature. Global Journal of Flexible Systems Management, 18(3), 203-229. doi:https://doi.org/10.1007/s40171-017-0159-3
  8. Gul, R., & Al-Faryan, M. A. S. (2023). From Insights to Impact: Leveraging Data Analytics for Data-Driven Decision-Making and Productivity in Banking Sector. Humanities and Social Sciences Communications, 10(1), 1-8. doi:https://doi.org/10.1057/s41599-023-02122-x
  9. Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques 3rd Edition. Waltham: Morgan Kaufmann Publishers.
  10. Haris, Y., Friadi, J., Frederick, A. E. S., Huda, D. N., & Romdoni, M. R. (2024). Clustering Data Stok Penjualan Sparepart Mobil Toyota Bengkel Multi Topindo Menggunakan K-Means. Jurnal Ilmu Siber dan Teknologi Digital, 2(2), 109-121. doi:https://doi.org/10.35912/jisted.v2i2.3308
  11. Johnson, M., Jain, R., Brennan-Tonetta, P., Swartz, E., Silver, D., Paolini, J., . . . Hill, C. (2021). Impact of Big Data and Artificial Intelligence on Industry: Developing a Workforce Roadmap for a Data Driven Economy. Global Journal of Flexible Systems Management, 22(3), 197-217. doi:https://doi.org/10.1007/s40171-021-00272-y
  12. Kholifah, N., Gadi, A. C. Z., Yuli, S. E., Khayati, E. Z., & Triyanto, T. (2020). Penggunaan Zoom Cloud Meeting sebagai Alternatif Pembelajaran Jarak Jauh. Prosiding Pendidikan Teknik Boga Busana, 15(1).
  13. Lewis, A., & Stoyanovich, J. (2022). Teaching Responsible Data Science: Charting New Pedagogical Territory. International Journal of Artificial Intelligence in Education, 32(3), 1-25. doi:https://doi.org/10.1007/s40593-021-00241-7
  14. Mehmood, I., Shahid, S., Hussain, H., Khan, I., Ahmad, S., Rahman, S., . . . Huda, S. (2023). A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning. IEEE Access, 11, 63579-63597. doi:https://doi.org/10.1109/ACCESS.2023.3287326
  15. Mildenberger, T., Braschler, M., Ruckstuhl, A., Vorburger, R., & Stockinger, K. (2023). The Role of Data Scientists in Modern Enterprises-Experience from Data Science Education. ACM SIGMOD Record, 52(2), 48-52. doi:https://doi.org/10.1145/3615952.3615966
  16. Monino, J.-L. (2021). Data Value, Big Data Analytics, and Decision-Making. Journal of the Knowledge Economy, 12(1), 256-267.
  17. Natasuwarna, A., Pangihutan, H. B., & Ramadani, S. (2022). Workshop Online Penerapan Data Science pada Dunia Pendidikan. Seminar Nasional Penelitian dan Pengabdian Kepada Masyarakat CORISINDO, 186-189.
  18. Natasuwarna, A. P. (2019). Seminar Pendekatan Data Mining Memprediksi Profil Sosial Masyarakat Menggunakan Aplikasi RapidMiner. SNPMas: Seminar Nasional Pengabdian pada Masyarakat, 38-44.
  19. Nisya, I. S., Wulansari, O. D. E., & Wartariyus, W. (2023). Rancang Bangun Game Edukasi Bencana Alam Menggunakan Metode MDLC. Jurnal Ilmu Siber dan Teknologi Digital, 2(1), 23-44. doi:https://doi.org/10.35912/jisted.v2i1.2374
  20. Nottbrock, C., Looy, A. V., & Haes, S. D. (2023). Impact of Digital Industry 4.0 Innovations on Interorganizational Value Chains: A Systematic Literature Review. Business Process Management Journal, 29(1), 43-76. doi:https://doi.org/10.1108/BPMJ-06-2022-0259
  21. Putra, T. I. Z. M., Suprapto, S., & Bukhori, A. F. (2022). Model Klasifikasi Berbasis Multiclass Classification dengan Kombinasi Indobert Embedding dan Long Short-Term Memory untuk Tweet Berbahasa Indonesia. Jurnal Ilmu Siber dan Teknologi Digital, 1(1), 1-28. doi:https://doi.org/10.35912/jisted.v1i1.1509
  22. Putri, A. N., Wakhidah, N., & Utomo, V. G. (2022). Pemanfaatan Data Mining untuk Media Pembelajaran di SMK Hidayah Semarang. E-Dimas: Jurnal Pengabdian Kepada Masyarakat, 13(3), 487-491. doi:https://doi.org/10.26877/e-dimas.v13i3.5572
  23. Rajan, D., Beymer, D., Abedin, S., & Dehghan, E. (2020). Pi-PE: A Pipeline for Pulmonary Embolism Detection using Sparsely Annotated 3D CT Images. Proceedings of the Machine Learning for Health NeurIPS Workshop, 116, 220-232.
  24. Rengarajan, S., Narayanamurthy, G., Moser, R., & Pereira, V. (2022). Data Strategies for Global Value Chains: Hybridization of Small and Big Data in the Aftermath of COVID-19. Journal of Business Research, 144, 776-787. doi:https://doi.org/10.1016/j.jbusres.2022.02.042
  25. Sarwosri, Rochimah, S., Yuhana, U. L., Siahaan, D. O., & Akbar, R. J. (2024). Pelatihan Pemrograman Web Dasar untuk Siswa di SMA Negeri 1 Bojonegoro. Sewagati: Jurnal Pengabdian Kepada Masyarakat, 8(1), 1053-1060. doi:https://doi.org/10.12962/j26139960.v8i1.548
  26. Setiawan, E., Nurhatisyah, N., & Nanra, S. (2023). Pengontrolan Bahaya Kebakaran Berbasis IOT pada Ruang Server SMFR Balai Monitor Spektrum Frekuensi Radio Kelas II Batam. Jurnal Ilmu Siber dan Teknologi Digital, 1(1), 41-51. doi:https://doi.org/10.35912/jisted.v1i1.1800
  27. Shabbir, M. Q., & Gardezi, S. B. W. (2020). Application of Big Data Analytics and Organizational Performance: The Mediating Role of Knowledge Management Practices. Journal of Big Data, 7, 1-17. doi:https://doi.org/10.1186/s40537-020-00317-6
  28. Shaharabani, Y. F., & Yarden, A. (2019). Toward Narrowing the Theory–Practice Gap: Characterizing Evidence from in-Service Biology Teachers’ Questions Asked During an Academic Course. International Journal of STEM Education, 6, 1-13. doi:https://doi.org/10.1186/s40594-019-0174-3
  29. Slater, S., Joksimovi?, S., Kovanovic, V., Baker, R. S., & Gasevic, D. (2017). Tools for Educational Data Mining: A Review. Journal of Educational and Behavioral Statistics, 42(1), 85-106. doi:https://doi.org/10.3102/1076998616666808
  30. Suyudi, I., Sudadio, & Suherman. (2022). Pengenalan Bahasa Isyarat Indonesia Menggunakan Mediapipe dengan Model Random Forest dan Multinomial Logistic Regression. Jurnal Ilmu Siber dan Teknologi Digital, 1(1), 65-80. doi:https://doi.org/10.35912/jisted.v1i1.1899
  31. Taufik, R., Muhaqiqin, Ilman, I. S., & Sholehurrohman, R. (2023). Analisis Informasi Jaringan Homogen dan Heterogen pada Liga Champions UEFA. Jurnal Ilmu Siber dan Teknologi Digital, 1(2), 91-110. doi:https://doi.org/10.35912/jisted.v1i2.1928
  32. Tu, X., Zou, J., Su, W. J., & Zhang, L. (2023). What Should Data Science Education Do With Large Language Models?. Harvard Data Science Review, 6(1), 1-21. doi:https://doi.org/10.1162/99608f92.bff007ab
  33. Vyas, V., & Uma, V. (2018). An Extensive Study of Sentiment Analysis Tools and Binary Classification of Tweets Using Rapid Miner. Procedia Computer Science, 125, 329-335. doi:https://doi.org/10.1016/j.procs.2017.12.044
  34. Wang, D., Weisz, J. D., Muller, M., Ram, P., Geyer, W., Dugan, C., . . . Gray, A. (2019). Human-AI Collaboration in Data Science: Exploring Data Scientists' Perceptions of Automated AI. Proceedings of the ACM on Human-Computer Interaction, 3, 1-24. doi:https://doi.org/10.1145/3359313