Prediction of Admission Decisions Using Machine Learning Models: An Analysis of the Holistic Undergraduate Admissions Review Process in Korea

Shin, Yousun; Kang, Hee Sun; Park, So Yun

doi:10.66224/johepal.7.2.92

Volume 7, Issue 2 (JUNE ISSUE 2026) johepal 2026, 7(2): 92-110 | Back to browse issues page

‎ 10.66224/johepal.7.2.92

Mendeley

Zotero

RefWorks

Shin Y, Kang H S, Park S Y. (2026). Prediction of Admission Decisions Using Machine Learning Models: An Analysis of the Holistic Undergraduate Admissions Review Process in Korea. johepal. 7(2), 92-110. doi:10.66224/johepal.7.2.92
URL: http://johepal.com/article-1-1768-en.html

Prediction of Admission Decisions Using Machine Learning Models: An Analysis of the Holistic Undergraduate Admissions Review Process in Korea

Yousun Shin ^*

, Hee Sun Kang

, So Yun Park

Abstract: (698 Views)

This study aimed to examine and validate the consistency and predictive patterns of human-led undergraduate admissions decisions through the application of machine learning models. Unlike traditional holistic evaluation processes conducted by human assessors, this study compared five machine learning algorithms − Gradient Boosting, Random Forest, Support Vector Machine, Logistic Regression, and XGBoost − to identify the most accurate prediction model. The analysis utilized a dataset of 1,554 application records from the 2024 application cycle. To further improve prediction accuracy, Latent Dirichlet Allocation (LDA) was utilized to extract relevant features from unstructured textual data. The findings revealed that the XGBoost model performed best in predicting admission outcomes. This result is attributed to the learning mechanisms of tree-based ensemble models, which is capable of capturing the complex interactions between non-linear score patterns and various others variables. Major factors influencing admission decisions encompassed interview scores, type of application, and document evaluation scores, highlighting their significance in the selection process and validating the effectiveness of the XGBoost as a supportive tool. These findings not only provide practical recommendations for improving prediction accuracy but also inform future research directions in data-driven strategies for high-stakes educational assessment.

Keywords: Holistic Undergraduate Admissions Review Process, Latent Dirichlet Allocation (LDA), XGBoost, Machine Learning, Educational Data Mining, Prediction Accuracy

Full-Text [PDF 1665 kb] (343 Downloads)

Type of Study: Research | Subject: Special
Received: 2026/02/19 | Accepted: 2026/06/7 | Published: 2026/06/30

References

1. Ahuja, R., & Kankane, Y. (2017). Predicting the probability of student’s degree completion by using different data mining techniques. In 2017 Fourth International Conference on Image Information Processing (ICIIP) (pp. 1-4). IEEE. [DOI]

2. Al-Alawi, L., AL Shaqsi, J., Tarhini, A., & AL-Busaidi, A. S. (2023). Using machine learning to predict factors affecting academic performance: The case of college students on academic probation. Education and Information Technologies, 28(10), 12407-12432. [DOI]

3. Alghamdi, A., Barsheed, A., AlMshjary, H., & AlGhamdi, H. (2020). A machine learning approach for graduate admission prediction. In Proceedings of the 2020 2nd International Conference on Image, Video and Signal Processing (pp. 155-158). [DOI]

4. Altabrawee, H., Ali, O. A. J., & Ajmi, S. Q. (2019). Predicting students’ performance using machine learning techniques. Journal of University of Babylon for Pure and Applied Sciences, 27(1), 194-205. [DOI]

5. Alyahyan, E., & Düştegör, D. (2020). Predicting academic success in higher education: Literature review and best practices. International Journal of Educational Technology in Higher Education, 17(1), 3-24. [DOI]

6. Anderson, T., & Kohler, H. –P. (2013). Education fever and the east Asian fertility puzzle: A case study of low fertility in South Korea. Asian Population Studies, 9(2), 196-215. [DOI]

7. Arora, S. (2024, August 14). Data mining Vs. machine learning: The key difference. Simplilearn. [Article]

8. Baker, R. S. J. D., Corbett, A. T., ROLL, I., & Koedinger, K. R. (2009). Developing a generalizable detector of when students game the system. User Modeling and User-Adapted Interaction, 18(3), 287-314. [DOI]

9. Baker, R. S. J. D., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3-17. [DOI]

10. Bharara, S., Sabitha, S., & Bansal, A. (2018). Application of learning analytics using clustering data mining for students’ disposition analysis. Education and Information Technologies, 23(2), 957-984. [DOI]

11. Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: Analyzing text with the natural language toolkit. O’Reilly Media, Inc.

12. Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics, 1(1), 17-35. [DOI]

13. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022. [Article]

14. Bornmann, L., Mittag, S., & Danie, H. -D. (2006). Quality assurance in higher education – meta-evaluation of multi-stage evaluation procedures in Germany. Higher Education, 52(4), 687-709. [DOI]

15. Bowers, A. J., & Zhou, X. (2019). Receiver operating characteristic (ROC) area under the curve (AUC): A diagnostic measure for evaluating the accuracy of predictors of education outcomes. Journal of Education for Students Placed at Risk (JESPAR), 24(1), 20-46. [DOI]

16. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. [DOI]

17. Bucos, M., & Drăgulescu, B. (2018). Predicting student success using data generated in traditional educational environments. TEM Journal, 7(3), 617-625. [DOI]

18. Bujang, S. D. A., Selamat, A., Ibrahim, R., Krejcar, O., Herrera-Viedma, E., Fujita, H., & Ghani, N. A. Md. (2021). Multiclass prediction model for student grade prediction using machine learning. IEEE Access, 9, 95608–95621. [DOI]

19. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). [DOI]

20. Chen, X., Zou, D., Cheng, G., & Xie, H. (2020). Detecting latent topics and trends in educational technologies over four decades using structural topic modeling: A retrospective of all volumes of Computers & Education. Computers & Education, 151, 103855. [DOI]

21. Chicco, D., & Juman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1), 6. [DOI]

22. Ekowo, M., & Palmer, I. (2016, October 24). The promise and peril of predictive analytics in higher education: A landscape analysis. New America. [Article]

23. Fernandes, E., Holanda, M., Victorino, M., Borges, V., Carvalho, R., & Van Erven, G. (2019). Educational data mining: Predictive analysis of academic performance of public school students in the capital of Brazil. Journal of Business Research, 94, 335-343. [DOI]

24. Geiser, S., & Santelices, M. V. (2007). Validity of high-school grades in predicting student success beyond the freshman year: High-school record vs. standardized tests as indicators of four-year college outcomes. Research and Occasional Papers Series. Center for Studies in Higher Education. [Article]

25. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer.

26. Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons.

27. Hossler, D., Chung, E., Kwon, J., Lucido, J., Bowman, N., & Bastedo, M. (2019). A study of the use of nonacademic factors in holistic undergraduate admissions reviews. The Journal of Higher Education, 90(6), 833-859. [DOI]

28. Hussain, M., Zhu, W., Zhang, W., & Abidi, S. M. R. (2019). Student engagement predictions in an e-learning system and their impact on student course assessment scores. Computational Intelligence and Neuroscience, 9(4), 1-21. [DOI]

29. Ibrahim, Z. M. (2023). Text mining framework for detecting assessment and feedback issues using students’ evaluation surveys, [Doctoral dissertation, University of Portsmouth].

30. Jia, J. W., & Mareboyana, M. (2013). Machine learning algorithms and predictive models for undergraduate student retention. In Proceedings of the World Congress on Engineering and Computer Science (pp. 23-25). International Association of Engineers.

31. Jo, H. (2018). Changes and challenges in the rise of mass higher education in Korea. In A. Wu., & J. Hawkins (Eds.), Higher education in Asia: Quality, excellence and governance (pp. 39-56). Springer. [DOI]

32. Khan, M. A., Nabi, M. K., Khojah, M., & Tahir, M. (2020). Students’ perception towards e-learning during COVID-19 pandemic in India: An empirical study. Sustainability, 13(1), 57. [DOI]

33. Kaur, J., & Buttar, P. K. (2018). A systematic review on stopword removal algorithms. International Journal on Future Revolution in Computer Science & Communication Engineering, 4(4), 207-210. [Article]

34. Kim, H. (2024). A fad or the new norm for student access today? Evaluating enrollment outcomes of holistic admissions in South Korea. Research in Higher Education, 65(5), 1040-1064. [DOI]

35. Kim, S., & Kim, N. (2024). Unveiling the evolving educational inequality from upper secondary to higher education in South Korea: From effectively maintained inequality theory perspective. Higher Education, 89(6), 1637-1657. [DOI]

36. Kotsiantis, S. B. (2012). Use of machine learning techniques for educational purposes: A decision support system for forecasting students’ grades. Artificial Intelligence Review, 37(4), 331-344. [DOI]

37. Lakkaraju, H., Aguiar, E., Shan, C., Miller, D., BHANPURI, N., GHANI, R., & ADDISON, K. L. (2015). A machine learning framework to identify students at risk of adverse academic outcomes. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1909-1918). [DOI]

38. Lantz, B. (2019). Machine learning with R: Expert techniques for predictive modeling. Packt Publishing Ltd.

39. Ma, L. (2016). Female labour force participation and second birth rates in South Korea. Journal of Population Research, 33(2), 173-195. [DOI]

40. Maulana, A., Noviandy, T. R., Sasmita, N. R., Paristiowati, M., Suhendra, R., Yandri, E., & Idroes, R. (2023). Optimizing university admissions: A machine learning perspective. Journal of Educational Management and Learning, 1(1), 1-7. [DOI]

41. Nghe, N. T., Janecek, P., & Haddawy, P. (2007). A comparative analysis of techniques for predicting academic performance. In Proceedings of the 37th Annual Frontiers in Education Conference (pp. T2G7-T2G12). [DOI]

42. Mengash, H. A. (2020). Using data mining techniques to predict student performance to support decision making in university admission systems. IEEE Access, 8, 55462-55470. [DOI]

43. Namoun, A., & Alshanqiti, A. (2020). Predicting student performance using data mining and learning analytics techniques: A systematic literature review. Applied Sciences, 11(1), 237-265. [DOI]

44. Obsie, E. Y., & Adem, S. A. (2018). Prediction of student academic performance using neural network, linear regression and support vector regression: A case study. International Journal of Computer Applications, 180(40), 39-47. [DOI]

45. Posselt, J. R. (2016). Inside graduate admissions: Merit, diversity, and faculty gatekeeping. Harvard University Press.

46. Pradana, A. W., & Hayaty, M. (2019). The effect of stemming and removal of stopwords on the accuracy of sentiment analysis on Indonesian-language texts. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 4(4), 375-380. [DOI]

47. Prihatini, P. M., Suryawan, I. K., & Mandia, I. N. (2018). Feature extraction for document text using latent Dirichlet allocation. The Journal of Physics: Conference Series, 953(1), 012047. [DOI]

48. Raghavendran, C. V., Pavan Venkata Vamsi, C., Veerraju, T., & Veluri, R. K. (2021). Predicting student admissions rate into university using machine learning models. In D. Bhattacharyya, & N. Thirupathi Rao (Eds.), Machine Intelligence and Soft Computing: Proceedings of ICMISC 2020 (pp. 151-162). [DOI]

49. Rastrollo-Guerrero, J. L., Gómez-Pulido, J. A., & Durán-Domínguez, A. (2020). Analyzing and predicting students’ performance by means of machine learning: A review. Applied Sciences,10(3), 1-25. [DOI]

50. Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135-146. [DOI]

51. Siino, M., Tinnirello, I., & La Cascia, M. (2024). Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on transformers and traditional classifiers. Information Systems, 121, 102342. [DOI]

52. Singh, J., & Gupta, V. (2017). A systematic review of text stemming techniques. Artificial Intelligence Review, 48(2), 157-217. [DOI]

53. Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199-222. [DOI]

54. Tair, M. M. A., & El-Halees, A. M. (2012). Mining educational data to improve students’ performance: A case study. International Journal of Information and Communication Technology Research, 2(2), 140-146.

55. Taub, M., & Azevedo, R. (2018). Using sequence mining to analyze metacognitive monitoring and scientific inquiry based on levels of efficiency and emotions during game-based learning. Journal of Educational Data Mining, 10(3), 1-26. [DOI]

56. Walid, Md. A. A., Ahmed, S. M. M., Zeyad, M., Galib, S. M. S., & Nesa, M. (2022). Analysis of machine learning strategies for prediction of passing undergraduate admission test. International Journal of Information Management Data Insights, 2(2), 100111. [DOI]

57. Wang, Y., Sun, Z., Zhang, H., Cui, W., Xu, K., Ma, X., & Zhang, D. (2019). Datashot: Automatic generation of fact sheets from tabular data. IEEE Transactions on Visualization and Computer Graphics, 26(1), 895-905. [DOI]

58. Wu, J. -P., Lin, M. -S., & Tsai, C. -L. (2023). A predictive model that aligns admission offers with student enrollment probability. Education Sciences, 13(5), 440. [DOI]

59. Xu, L. (2024). Prediction of college admission scores based on an XGBoost-LSTM hybrid model. In Proceedings of the 3rd International Conference on Educational Innovation and Multimedia Technology, EIMT 2024, March 29-31. [DOI]

60. Yadav, S. K., Bharadwaj, B., & Pal, S. (2012). Mining education data to predict student's retention: A comparative study. arXiv. [DOI]

61. Yağci, M. (2022). Educational data mining: Prediction of students’ academic performance using machine learning algorithms. Journal of Educational Management and Learning, 9(1), 11-30. [DOI]

62. Yang, X., Yang, K., Cui, T., Chen, M., & He, L. (2022). A study of text vectorization method combining topic model and transfer learning. Processes, 10(2), 350. [DOI]

63. Yoo, S. H., & Sobotka, T. (2018). Ultra-low fertility in South Korea: The role of the tempo effect. Demographic Research, 38, 549-576. [DOI]

64. Young, N. T., Tollefson, K., Zegers, R. G., & Caballero, M. D. (2022). Rubric-based holistic review: A promising route to equitable graduate admissions in physics. Physical Review Physics Education Research, 18(2), 020140. [DOI]

65. Zafra, A., & Ventura, S. (2009). Predicting student grades in learning management systems with multiple instance genetic programming. In Proceedings of the 2009 9th International Working Group on Educational Data Mining (pp. 307-314). [Article]

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Designed & Developed by : Yektaweb

Related Websites:

Journal Keywords: