Generative AI tools in designing MCQs for English language examinations: Insights from Lecturers

Authors

DOI:

https://doi.org/10.54855/979-8-9870112-9-4_1

Keywords:

MCQs, Generative AI, AI-generated exam questions, syntax-based sentence transformations, ReParaphrased classification framework

Abstract

This study examines ChatGPT's ability to generate syntax-based sentence transformation multiple-choice questions (MCQs) using the syntactic types listed in the ReParaphrased classification framework. These include: Negation Switching (NS), Diathesis Alternation (DA), Subordination and Nesting Changes (SNC), Coordination Changes (CC), and Ellipsis (Ell). Using a quantitative approach, the researchers aim to provide personal insights into designing exam questions based on the content of B1 Empower. A statistical analysis of 120 AI-generated test items was conducted to identify the frequency and distribution of each syntactic transformation type, highlighting the favored patterns in the generated dataset. The findings suggest that ChatGPT tended to create test items using tactics with a clear pattern of transformation, such as SNC and NS, while showing less favor for tactics that require more nuanced contextual understanding, such as Ell and CC. In addition, AI could create questions quickly and effectively; however, some problems remained, including semantic distortions and awkward forms, such as double negatives or passives. This result highlights the crucial role of human intervention in proofreading and refining AI-generated questions to ensure the accuracy and relevance of the dataset items.

Author Biographies

Vu Thi Kim Chi, Saigon University, Hochiminh City, Vietnam

Vu Thi Kim Chi is currently a full-time lecturer at the Faculty of Foreign Languages - Saigon University. She received a Master's degree in TESOL from Victoria University, Australia. Her teaching practice involves language skills courses attended by non-English major students. She has aimed to apply technological innovation in the classroom in order to improve teaching and learning outcomes.

Nguyen Trinh To Anh, Saigon University, Hochiminh City, Vietnam

Nguyen Trinh To Anh has been an EFL instructor at the Faculty of Foreign Languages, Saigon University, Vietnam, since 2015. She holds a Master’s degree in English Language Studies from Hoa Sen University, Vietnam; a Master of Commerce specializing in Tourism and Hospitality Management from Macquarie University, Australia; and a Master of Professional Accounting from La Trobe University, Australia. Her teaching responsibilities include English for General Purposes and English for Specific Purposes, particularly in the fields of Accounting – Auditing, Finance – Banking, and Business Management. Her research interests focus on ESP teaching methodology and blended learning approaches.

Vo Dao Vuong Co, Saigon University, Hochiminh City, Vietnam

Vo Dao Vuong Co is currently a full-time lecturer at the Faculty of Foreign Languages – Saigon University. She received a Master’s degree in Applied Linguistics from Curtin University, Australia. Her teaching experience involves test-prep courses, general English courses attended by non-English major students, and linguistics courses attended by English-majored students. She has aimed to improve the learning experience and teaching outcome by creating the classroom where learning meets innovation.

References

Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. Cambridge: Cambridge University Press, 1-305.

Barzilay, R., & Lee, L. (2003). Learning to paraphrase: An unsupervised approach using multiple-sequence alignment. In Proceedings of HLT-NAACL 2003, 16-23. Association for Computational Linguistics. https://doi.org/10.3115/1073445.1073448

Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N., St. John, R., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., Sung, Y.-H., Strope, B., & Kurzweil, R. (2018). Universal sentence encoder for English. In Proceedings of Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 169–174. https://doi.org/10.48550/arXiv.1803.11175

Chen, M. H., Huang, S. T., Chang, J. S., & Liou, H. C. (2015). Developing a corpus-based paraphrase tool to improve EFL learners’ writing skills. Computer Assisted Language Learning, 28(1), 22–40. http://dx.doi.org/10.1080/09588221.2013.783873

Chomsky, N. (1957). Syntactic structures. The Hague: Mouton.

Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.

Chomsky, N. (2006). Language and Mind (3rd ed.). New York: Cambridge Press.

Dhawaleswar Rao, C. H., & Saha, S. K. (2020). Automatic multiple choice question generation from text: A survey. IEEE Transactions on Learning Technologies, 13(1), 14–25. https://doi.org/10.1109/TLT.2018.2889100

Fries, C. C. (1945). Teaching and learning English as a foreign language. Ann Arbor: University of Michigan Press, 1-153.

Ganitkevitch, J., Van Durme, B., & Callison-Burch, C. (2013). PPDB: The paraphrase database. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 758–764. Association for Computational Linguistics.

Gignac, G. E., & Szodorai, E. T. (2024). Defining intelligence: Bridging the gap between human and artificial perspectives. Intelligence, 104, 101832. https://doi.org/10.1016/j.intell.2024.101832

Hamilton, C. (2025). Keyword transformations: On the border of stylistics and language testing. Études de stylistique anglaise, 20, 1-18. https://doi.org/10.4000/14a6j

Heaton, J.B. (1979). Writing English Language Tests: A Practical Guide for Teachers of English. 5th Edition, Longman, London, 138.

Hirvela, A., & Du, Q. (2013). Why am I paraphrasing? Undergraduate ESL writers’ engagement with source-based academic writing and reading. Journal of English for Academic Purposes, 12(2), 87–98. https://doi.org/10.1016/j.jeap.2012.11.005

Hosking, T., & Lapata, M. (2021). Factorising meaning and form for intent-preserving paraphrasing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1405–1418, Online. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.112

Hughes, A. (2003). Testing for language teachers (2nd ed.). Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511732980

Isley, C., Gilbert, J., Kassos, E., Kocher, M., Nie, A., Brunskill, Domingue, B., Hofman, J., Legewie, J., Svoronos, T., Tuminelli, C., & Goel, S. (2025). Assessing the Quality of AI-Generated Exams: A Large-Scale Field Study [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2508.08314

Kim, N., Carlson, K., Dickey, M., & Yoshida, M. (2020). Processing gapping: Parallelism and grammatical constraints. Quarterly Journal of Experimental Psychology, 73(5), 781-798. https://doi.org/10.1177/1747021820903461

Kovatchev, V., Martí, M. A., & Salamó, M. (2018). ETPC: A paraphrase identification corpus annotated with extended paraphrase typology and negation. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 1384–1392.

Lado, R. (1964). Language teaching: A scientific approach. New York: McGraw-Hill.

Lakoff, G. (1971). On generative semantics. In D. D. Steinberg & L. A. Jakobovits (Eds.), Semantics: An interdisciplinary reader in philosophy, linguistics, and psychology, 232–296. Cambridge: Cambridge University Press.

Lakoff, G., & Ross, J. R. (1976). Is deep structure necessary? In J. D. McCawley (Ed.), Notes from the linguistic underground, 159–164. Brill. https://doi.org/10.1163/9789004368859_011

Le, T. T. H. (2024). Evaluating HUFLIT Lecturers’ Perspectives on ChatGPT’s Capabilities in Designing English Testing and Assessment. In Proceedings of the AsiaCALL International Conference, 6, 157-181. https://doi.org/10.54855/paic.24612

Maas, A., Yamada, K., Nagahama, T., Kawada, T., & Horita, T. (2024). Question Generation for English Reading Comprehension Exercises using Transformers. IIAI Letters on Informatics and Interdisciplinary Research, 5, 1-12. https://doi.org/10.52731/liir.v005.183

McCawley, J. D. (1968). Lexical insertion in a transformational grammar without deep structure. In Proceedings from the 4th Annual Meeting of the Chicago Linguistic Society, 4 (1), 71–80. Chicago Linguistic Society.

Mulla, N., & Gharpure, P. (2023). Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications. Progress in Artificial Intelligence, 12(1), 1-32. https://doi.org/10.1007/s13748-023-00295-9

Na, C. D., & Mai, N. X. N. C (2017). Paraphrasing in academic: A case study of Vietnamese learners of English. Language Education in Asia, 8(1), 9–24. http://dx.doi.org/10.5746/LEiA/17/V8/I1/A02/Na_Mai

Nation, I. S. P. (2009). Teaching ESL/EFL reading and writing. New York: Routledge.

Nguyen, T. P. T. (2023). The Application of ChatGPT in Language Test Design – The What and How. In Proceedings of the AsiaCALL International Conference, 4, 104-115. https://doi.org/10.54855/paic.2348

Nunan, D. (1989). Designing tasks for the communicative classroom. Cambridge: Cambridge University Press.

Paribakht, T. S. (2004). The role of grammar in second language lexical processing. RELC Journal, 35(2), 149-160. https://doi.org/10.1177/003368820403500204

Poppels, T. (2020). Towards a referential theory of ellipsis. University of California, San Diego, 1-242. https://escholarship.org/uc/item/2830w1xn

Postal, P. M. (1974). On raising: One rule of English grammar and its theoretical implications. Cambridge, MA: MIT Press.

Rodriguez-Torrealba, R., Garcia-Lopez, E., & Garcia-Cabot, A. (2022). End-to-end Generation of Multiple-choice Questions using Text-to-text Transfer Transformer Models. Expert Systems with Applications, 118258. https://doi.org/10.1016/j.eswa.2022.118258

Settles, B., LaFlair, G. T., & Hagiwara, M. (2020). Machine Learning - Driven Language Assessment. Transactions of the Association for Computational Linguistics, 8, 247-263. https://doi.org/10.1162/tacl_a_00310

The Case HQ. (2025, April 7). Powerful guide to writing exam questions using Gen AI effectively. The Case HQ. https://thecasehq.com/powerful-guide-to-writing-exam-questions-using-gen-ai-effectively/

Thompson, D., Ling, S. P., Myachykov, A., Ferreira, F., & Scheepers, C. (2013). Patient-related constraints on get- and be-passive uses in English: evidence from paraphrasing. Frontiers in Psychology, 4. https://doi.org/10.3389/fpsyg.2013.00848

Tran, T. T. T., & Nguyen, H. B. (2022). The Effects of Paraphrasing on EFL Students’ Academic Writing. Journal of Language and Linguistic Studies. 18(1), 976-987.

Vahtola, T., Creutz, M. & Tiedemann, J. (2022). It Is Not Easy To Detect Paraphrases: Analysing Semantic Similarity With Antonyms and Negation Using the New SemAntoNeg Benchmark. In Proceedings of the 5th BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 249–262. Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.blackboxnlp-1.20

Vila, M., Martí, M. A., & Rodríguez, H. (2014). Is this a paraphrase? What kind? Paraphrase boundaries and typology. Open Journal of Modern Linguistics, 4(3), 205–218. https://doi.org/10.4236/ojml.2014.41016

Wahle, J. P., Gipp, B., & Ruas, T. (2023). Paraphrase Types for Generation and Detection. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 12148-12164. https://doi.org/10.18653/v1/2023.emnlp-main.746

Widdowson, H. G. (1978). Teaching language as communication. Oxford: Oxford University Press.

Wieting, J., & Gimpel, K. (2017). ParaNMT-50M: Pushing the limits of paraphrastic sentence embeddings with millions of machine translations. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistic, 1, 451-462. https://doi.org/10.18653/v1/P18-1042

Wieting, J., Bansal, M., Gimpel, K., Livescu, K., & Roth, D. (2015). From paraphrase database to compositional paraphrase model and back. Transactions of the Association for Computational Linguistics, 3, 345–358. https://doi.org/10.1162/tacl_a_00143

Zhang, M., & Li, J. (2021). A Commentary of GPT-3 in MIT Technology Review 2021. Fundamental Research, 1(6), 831-833. https://doi.org/10.1016/j.fmre.2021.11.011

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). BERTScore: Evaluating text generation with BERT [Preprint]. ArXiv (Cornell University). https://doi.org/10.48550/arXiv.1904.09675

Zhou, C., Qiu, C., Liang, L., & Acuna, D. E. (2025). Paraphrase Identification with Deep Learning: A Review of Datasets and Methods. IEEE Access, 13, 65797-65822. https://doi.org/10.1109/access.2025.3556899

Zhou, J., & Bhat, S. (2021). Paraphrase Generation: A Survey of the State of the Art. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 5075–5086. https://doi.org/10.18653/v1/2021.emnlp-main.414

Zhou, Z., Sperber, M., & Waibel, A. (2019). Paraphrases as Foreign Languages in Multilingual Neural Machine Translation [Preprint]. ArXiv (Cornell University). https://doi.org/10.18653/v1/p19-2015

Downloads

Published

17-12-2025

How to Cite

Vu, T. K. C., Nguyen, T. T. A., & Vo, D. V. C. (2025). Generative AI tools in designing MCQs for English language examinations: Insights from Lecturers. ICTE Conference Proceedings, 9, 1–19. https://doi.org/10.54855/979-8-9870112-9-4_1