Evaluating ChatGPT’s Reliability in Grading Writing Assignments on the EOP Learning Platform

Authors

DOI:

https://doi.org/10.54855/979-8-9870112-8-7_1

Keywords:

AI grading, assessment reliability, ChatGPT, writing evaluation, advantages and limitations

Abstract

As Artificial Intelligence (AI) is being used more and more in education, utilizing AI to grade the writing of students is a concern for trustworthiness. Using a mixed-methods research design that combines both quantitative and qualitative data collection tools -questionnaires and semi-structured interviews - this study investigates the reliability of using ChatGPT to mark students' writing assignments on the EOP online learning platform (https://eop.edu.vn/) compared with human evaluators at the School of Languages and Tourism, Hanoi University of Industry. The findings provide the advantages and limitations of AI-supported grading, highlighting the accuracy, consistency, and alignment with human grading criteria of AI grading. The attitudes of teachers toward AI scoring are also examined in this paper to determine its accuracy. Recommendations for enhancing AI scoring systems to enable more effective and fairer assessments are provided based on the findings. The research contributes to the academic literature on the use of AI in education, emphasizing the importance of sustaining the enhancement of AI-driven evaluation tools to enable effective and fairer online learning.

Author Biographies

Tran Yen Van, Hanoi University of Industry, Hanoi, Vietnam

Ms. Tran Yen Van is a lecturer of English at Hanoi University of Industry, Vietnam. She has been teaching for 18 years. During those times, she has been interested in ELT, especially developing students’ proficiency in Listening, Reading, Writing and Speaking skills as well as communication skills. Her research interests include Computer Assisted Language Learning, Cognitive Linguistics, Educational Technology, and ELT Methodology.

Le Thi Huong Giang, Hanoi University of Industry, Hanoi, Vietnam

Ms. Le Thi Huong Giang has been teaching English in Hanoi University of Industry since 2007. She received BA degree in 2005 and MA degree in 2010 from the University of Language and International Studies, Vietnam National University, Hanoi. Her areas of professional interest are designing curriculum, course books in a blended learning environment, developing test specification and writing test items. Currently, she administers a team to design teaching and learning materials for students in Faculty of Mechanical Engineering.

References

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative research in psychology, 3(2), 77-101.

Flower, L., & Hayes, J. R. (1981). A cognitive process theory of writing. College Composition and Communication, 32(4), 365–387. https://doi.org/10.2307/356600

Ho, L. T. P., Doan, N. A. H., & Dinh, T. L. (2023). An Investigation into The Online Assessment and The Autonomy of Non-English Majored Students in Vinh Long Province. ICTE Conference Proceedings, 3, 41–51. https://doi.org/10.54855/ictep.2334

Holmes, W., Bialik, M., & Fadel, C. (2019). Artificial intelligence in education: Promises and implications for teaching and learning. Center for Curriculum Redesign.

Hyland, K., & Hyland, F. (2006). Interpersonal aspects of response: Constructing and interpreting. In K. Hyland & F. Hyland (Eds.), Feedback in second language writing: Contexts and issues (pp. 206–224). Cambridge University Press.

Dwivedi, Y. K., Hughes, D. L., Ismagilova, E., Aarts, G., Coombs, C., Crick, T., ... & Williams, M. D. (2023). So what if ChatGPT wrote it? Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management, 71, 102642. https://doi.org/10.1016/j.ijinfomgt.2023.102642

Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and machines, 30(4). 681-694 https://doi.org/10.1007/s11023-020-09548-1

Guo, K., & Wang, D. (2024). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29(7), 8435–8463. https://doi.org/10.1007/s10639-023-12306-1

Kumar, R., & Rose, C. (2023a). The promise and peril of ChatGPT for language assessment. Language Testing, 40(2), 123–139. https://doi.org/10.1177/02655322231156807

Kumar, S., & Rose, C. (2023b). Evaluating ChatGPT as a writing evaluator: A comparison with human raters. Journal of Educational Technology Development and Exchange, 16(2), 20–35. https://doi.org/10.18785/jetde.1602.02

Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., ... & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274

Montenegro-Rueda, M., Fernández-Cerero, J., Fernández-Batanero, J. M., & López-Meneses, E. (2023). Impact of the implementation of ChatGPT in education: A systematic review. Computers, 12(8), 153. https://doi.org/10.3390/computers12080153

Nguyen, T. T. H. (2023). EFL Teachers’ Perspectives toward the Use of ChatGPT in Writing Classes: A Case Study at Van Lang University. International Journal of Language Instruction, 2(3), 1-47. https://doi.org/10.54855/ijli.23231

Luckin, R., Holmes, W., Griffiths, M., & Forcier, L. B. (2016). Intelligence unleashed: An argument for AI in education. Pearson Education.

Lu, Q., Yao, Y., Xiao, L., Yuan, M., Wang, J., & Zhu, X. (2024). Can ChatGPT effectively complement teacher assessment of undergraduate students’ academic writing? Assessment & Evaluation in Higher Education, 49(5), 616–633. https://doi.org/10.1080/02602938.2023.2290436

Plano Clark, V. L. (2017). Mixed methods research. The Journal of Positive Psychology, 12(3), 305-306. https://doi.org/10.1080/17439760.2016.1262619

Prompiengchai, S., Narreddy, C., & Joordens, S. (2025). A practical guide for supporting formative assessment and feedback using generative AI. arXiv. https://arxiv.org/abs/2505.23405

Ranalli, J., Link, S., & Chukharev-Hudilainen, E. (2017). Automated writing evaluation for formative assessment of L2 writing: Investigating the accuracy and usefulness of feedback as part of argument-based validation. Educational Psychology, 37(1), 8-25

Sari, A. N. (2024). Exploring the potential of using AI language models in democratising global language test preparation. International Journal of TESOL & Education, 4(4), 111–126. https://doi.org/10.54855/ijte.24447

Selwyn, N. (2019). Should robots replace teachers? AI and the future of education. Polity Press.

Shermis, M. D., & Hamner, B. (2012, April). Contrasting state-of-the-art automated scoring of essays: Analysis. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), Vancouver, Canada.

Wang, S., Wang, F., Zhu, Z., Wang, J., Tran, T., & Du, Z. (2024). Artificial intelligence in education: A systematic literature review. Expert Systems with Applications, 252, 124167. https://doi.org/10.1016/j.eswa.2023.124167

Weigle, S. C. (2002). Assessing writing. Cambridge University Press.

Zhai, X. (2022). ChatGPT user experience: implications for education: A review and research agenda. Educational Technology Research and Development, 70, 1–24. http://dx.doi.org/10.2139/ssrn.4312418

Downloads

Published

03-10-2025

How to Cite

Tran, Y. V., & Le, T. H. G. (2025). Evaluating ChatGPT’s Reliability in Grading Writing Assignments on the EOP Learning Platform. ICTE Conference Proceedings, 7, 1–19. https://doi.org/10.54855/979-8-9870112-8-7_1

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.