کاربرد زبان‌شناسی رایانشی برای پیش‌بینی سطحِ مهارت زبانیِ کتاب‌های درسی زبان‌آموزانِ فارسی

قیومی, مسعود

doi:10.22051/lghor.2021.32656.1354

نوع مقاله : مقاله پژوهشی

نویسنده

مسعود قیومی

استادیار، پژوهشکده زبان‌شناسی، پژوهشگاه علوم انسانی و مطالعات فرهنگی، تهران، ایران

https://doi.org/10.22051/lghor.2021.32656.1354

چکیده

یکی از زیرحوزه‌های ارزیابی مهارت زبانی، پیش‌بینی سطح مهارت زبانی است. این پژوهش به‌دنبال ارائۀ یک مدل زبان‌شناسی رایانشی برای پیش‌بینی سطح مهارت زبانی و کشف ویژگی‌های کلی این سطوح است. برای این هدف، نگارنده یک پیکرۀ زبانی از کتاب‌های درسی زبان‌آموزان فارسی گردآوری شده و ویژگی‌های زبان‌شناختی و آماری از این پیکرۀ متنی برای آموزش سه دسته‌بند به‌‌عنوان یادگیرنده (زبان‌آموز) استخراج شده‌است. عملکرد مدل‌ها‌ بر مبنای الگوریتم یادگیری و مجموعۀ ویژگی‌هایی که برای آموزش مدل‌ها به ‌کار رفته‌است متفاوت است. برای ارزیابی مدل‌ها، چهار معیار اندازه‌گیری استاندارد به نام درستی، دقت، فراخوانی و معیار اف استفاده شده‌است. بر اساس یافته‌ها، هنگامی‌که ویژگی‌های آماریِ استخراج شده از متن خام، به کار گرفته شد، مدل ساخته‌شده توسط دسته‌بند جنگل تصادفی بهترین عملکرد مدل را به‌دست آورد. دسته‌بند ماشین بردار پشتیبان با استفاده از ویژگی‌های زبان‌شناختی استخراج‌شده از پیکرة نشانه‌گذاری‌شده به ‌صورت خودکار، بهترین عملکرد مدل را به‌دست آورد. نتایج به‌دست‌آمده مشخص می‌کند که غنی‌سازی مدل و فراهم‌آوری اطلاعات مختلف، تضمین‌ نمی‌کند که یک دسته‌بند (زبان‌آموز) بهترین عملکرد مدل را به‌دست آورد. برای کشف روش‌شناختی آموزشی پنهان در کتاب‌های درسی، عملکرد کلی دسته‌بندها با‌توجه ‌به سطح زبانی و دانش زبان‌شناختی به‌کاررفته برای ساخت مدل بررسی شد. براساس یافته‌های به‌دست‌آمده، تعداد ویژگی‌های استخراج‌شده نقش به‌سزایی در آموزش یک دسته‌بند دارد. همچنین، بهترین عملکرد میانگین دسته‌بند‌ها از الگوهای دستوری در سطح مهارت زبانی الف (مبتدی)، به همة دانش زبان‌شناسی در سطوح ب (متوسط) و ج (پیشرفته) گسترش می‌یابد.

کلیدواژه‌ها

20.1001.1.2588350.2022.6.1.2.3

مراجع

Ahmadzadeh, K., Khosravi, A., Arastoopoor, S., & Tahmasebi, R. (2014). Assessing the readability of patient education materials about diabetes available in Shiraz Health Centers. Iranian Journal of Medical Education, 14(8), 661-667. http://ijme.mui.ac.ir/article-1-3157-en.pdf
Aslin, R., Saffran, J., & Newport, E. (1998). Computation of conditional probability statistics by 8-month old infants. Psychological Science, 9, 321-324. https://doi.org/10.1111/1467-9280.00063
Belkhir, S. (2020). Cognition and language learning: An introduction. In S. Belkhir (Ed.), Cognition and language learning (pp. 1-12). Cambridge Scholars Publishing.
Bijankhan, M. (2004). The role of corpora in writing a grammar: Introducing a software. Journal of Linguistics, 19(2), 48-67.
Bohnet, (2009). Efficient parsing of syntactic and semantic dependency structures. In Proceedings of the 13th conference on computational natural language learning: Shared task (pp. 67-72). Association for Computational Linguistics. https://www.aclweb.org/anthology/W09-1210.pdf
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the 5^th annual workshop on computational learning theory (pp. 144-152). https://doi.org/10.1145/130385.130401
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
Brumfit, C., & Johnson, K. (1979). The communicative approach to language teaching, Oxford University Press.
Bush, M., & Terry, R. (1997). Technology-enhanced language learning, National Textbook Company.
Chomsky, N. (1965). Aspects of the Theory of Syntax. The MIT Press.
Chomsky, N. (1968). Language and Mind. Harcourt Brace Jovanovich.
Chomsky, N. (1975). Reflections on language. Pantheon Books.
Chomsky, N. (1980). Rules and representations. Behavioral and Brain Sciences, 3, 1-61. https://doi.org/10.1017/S0140525X00001515
Cramer, J. S. (2002). The origins of logistic regression. Technical Report (pp. 167-178). Tinbergen Institute. https://papers.tinbergen.nl/02119.pdf
Crossley, S. A., Salsbury, T., & McNamara, D. S. (2011). Predicting the proficiency level of language learners using lexical indices. Language Testing, 29(2), 240-260. https://doi.org/10.1177/0265532211419331
de Wet, F., Van Der Walt, C., & Niesler, T. R. (2009). Automatic assessment of oral language proficiency and listening comprehension. Speech Communication, 52, 864-874. https://doi.org/10.1016/j.specom.2009.03.002
Djigunović, J. M., & Krajnović, M. M. (2005). Language teaching methodology and second language acquisition. In V. Muhvic-Dimanovski & L. Socanac (Eds.), Encyclopedia of life support systems, (pp. 394-417). Eolss Publishers Co. Ltd.
Doró, K. (2011). English language proficiency and the prediction of academic success of first-year students of English. UPRT 2010: Empirical studies in English applied linguistics (pp. 173-186). Lingua Franca Csoport. http://publicatio.bibl.uszeged.hu/11049/1/Doro%202011%20Language%20proficiency%20and%20academic%20success.pdf
Ebbinghaus, H. (1964). Memory: A contribution to experimental psychology. (H. A. Ruger & C. E. Bussenius, Trans.). Dover Publications. (Original work published 1885). https://doi.org/10.5214/ans.0972.7531.200408
Elliott, S. N., Kratochwill, T. R., Littlefield, C., J., & Travers, J. (2000). Educational psychology: Effective teaching, effective learning (3rd Ed.). McGraw-Hill College.
Ellis, R. (1997). Second language acquisition. Oxford University Press.
Eslami, M., Mosavi Atashgah, M., Alizadeh Lamjiri, S., & Zandi, T. (2004). Persian productive lexicon. In Proceedings of the 1st workshop on the Persian language and computer, University of Tehran.
Evans, V. (2014). The language myth: Why language is not an instinct. Cambridge University Press.
Ghaderi Moghaddam, M. E., & Sobhaninejad, M. (2016). Validation methods to measure textbooks readability. Research in Curriculum Planning, 13(21), 44-55.
Ghaffari, M., Mahmoodi Bakhtiyari, B., & Zolfaghari, H. (2004). Let’s learn Persian (Volumes 1-3). Madreseh Publication. https://jsr-e.khuisf.ac.ir/article_534415_65a3945c9994bc90c81c23ab0eacfaf7.pdf?lang=en
Ghayoomi, M. (2012). Bootstrapping the development of an HPSG-based treebank for Persian. Linguistic Issues in Language Technology, 7(1).
Ghayoomi, M. (2013). Word clustering for Persian statistical parsing. In H. Isahara, & K. Kanzaki, (Eds.), Advances in natural language processing, (pp. 126-137). Springer. https://doi.org/10.1007/978-3-642-33983-7_13
Ghayoomi, M. (2018). The problem of multi-words in syntactic processing of Persian. In Proceedings of the fourth nation conference on computational linguistics (pp. 11-40). Neviseh Parsi Publications.
Ghayoomi, M. (2019a). Studying issues for automatic processing of the Persian language on lemmatization, part-of-speech tagging, and parsing: Developing a software using machine learning methods. Technical Report. Tehran, Iran.
Ghayoomi, M. (2019b). Transition from rule-based to statistical lemmatization in Persian. In Proceedings of the 5th national conference on computational linguistics (pp. 57-86). Neveeseh Parsi Publications.
Ghayoomi, M. (2020). Structuring multilayer linguistic analyses in linguistic corpora. In F. Ghatreh & Sh. Modarres Khiabani, (Eds.), Word by word of life: Festschrift for professor Vida Shaghaghi (pp. 287-312). Neveeseh Parsi Publications.
Ghayoomi, M., & Kuhn, J. (2014). Converting an HPSG-based treebank into its parallel dependency-based Treebank. In Proceedings of the 9th international conference on language resources and evaluation (pp. 802-809). Reykjavik, Iceland. http://www.lrec-conf.org/proceedings/lrec2014/pdf/441_Paper.pdf
Gomez, R., & Gerken, L. (1999). Artificial grammar learning by one-year-olds leads to specific and abstract knowledge. Cognition, 70, 109-135. https://doi.org/10.1016/S0010-0277(99)00003-7
Indurkhya, N., & Damerau, F. J. (2010). Handbook of natural language processing. Chapman & Hall.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R. Springer.
Jung, Y. J., Crossley, S., & McNamara, D. (2019). Predicting second language writing proficiency in learner texts using computational tools. Journal of Asia TEFL, 16(1), 37-52.https://doi.org/10.18823/asiatefl.2019.16.1.3.37
Khademizadeh, S., & Vaezi, M. R. (2020). Evaluating readability of Persian fictions selected by flying Turtle the Iranian award. Publishing Research Quarterly, 36, 116-128. https://doi.org/10.1007/s12109-019-09705-0
Khodadady, E., & Mehrazmay, R. (2017). Evaluating two high intermediate EFL and ESL textbooks: A comparative study based on readability indices. Sociology International Journal, 1(3), 93-102. https://doi.org/10.15406/SIJ.2017.01.00016
Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st meeting of the association for computational linguistics (pp. 423-430). https://doi.org/10.3115/1075096.1075150
Levy, M., & Stockwell, G. (2006). CALL dimensions: Options and issues in computer assisted language learning. Lawrence Erlbaum Associates.
Luo, D., Minematsu, N., Yamauchi, Y., & Hirose, K. (2008). Automatic assessment of language proficiency through shadowing. In Proceedings of 6th international symposium on Chinese spoken language processing (pp. 41-44). https://doi.org/10.1109/CHINSL.2008.ECP.22
MacWhinney, B. (1999). The emergence of language. Lawrence Erlbaum Associates.
MacWhinney, B. (2005). Item-based constructions and the logical problem. In Proceedings of the workshop on psychocomputational models of human language acquisition (pp. 53-68). Ann Arbor, Michigan. https://doi.org/10.3115/1654524.1654531
MacWhinney, B. (2010). Computational models of child language learning: An introduction. Journal of Child Language, 37(3), 477-485. https://doi.org/10.1017/S0305000910000139
Maftoon, P., & Daghigh, M. (2001). Metric of determining readability of translated texts from English into Persian. Humanities Bulletin, 29, 61-80. https://www.sid.ir/fa/journal/ViewPaper.aspx?id=27487
Marty, F. (1981). Reflections on the use of computers in second language acquisition. System, 9(2), 85-98. https://eric.ed.gov/?id=ED218932
Matlin, M. W. (2005). Cognition. John Wiley and Sons.
Matusevych, Y., Alishahi, A., & Backus, A. (2013). Computational simulations of second language construction learning. In Proceedings of the workshop on cognitive modeling and computational linguistics (pp. 47-56). Sofia, Bulgaria. Association for Computational Linguistics. https://www.aclweb.org/anthology/W13-2606.pdf
McLean, S., Stewart, J., & Batty, A. O. (2020). Predicting L2 reading proficiency with modalities of vocabulary knowledge: A bootstrapping approach. Language Testing, 37(3), 389-411. https://doi.org/10.1177/0265532219898380
Mohammadi, H., & Khasteh, S. H. (2020). A machine learning approach to Persian text readability assessment using a crowd-sourced dataset. In Proceedings of the 28^th Iranian conference on electrical engineering, University of Tabriz. https://doi.org/10.1109/ICEE50131.2020.9260933
Monaghan, P., Chang, Y. N., Welbourne, S., & Brysbaert, M. (2017). Exploring the relations between word frequency, language exposure, and bilingualism in a computational model of reading. Journal of Memory and Language, 93, 1-21. https://doi.org/10.1016/j.jml.2016.08.003
Müller, T., Cotterell, R., Fraser, A., & Schütze, H. (2015). Joint lemmatization and morphological tagging with lemming. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 2268-2274). Lisbon, Portugal. Association for Computational Linguistics. https://www.aclweb.org/anthology/D15-1272.pdf
Müller, T., Schmid, H., & Schütze, H. (2013). Efficient higher-order CRFs for morphological tagging. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 322-332). Seattle, Washington, USA. Association for Computational Linguistics. https://www.aclweb.org/anthology/D13-1032.pdf
Nazari, F., Farhadpour, M. R., & Soleymani, E. (2016). Measure the readability of the Persian text of the ‘Lets know More’ section of the Quran book for the grades two, three, and four of elementary school based on the Flash-Diani and Galing-Diani formulas. Quarterly Journal of Knowledge and Information Management, 3(3), 85-92. http://lib.journals.pnu.ac.ir/article_4415_f2b05f84f03592edc72327a8a72ec55b.pdf?lang=en
Newport, E., & Aslin, R. (2000). Innately constrained learning: Blending old and new approaches to language acquisition. In S. Howell, S. Fish, & T. Keith-Lucas, (Eds.), Proceedings of the 24th annual Boston University conference on language development, Somerville, MA. Cascadilla Press.
Paribakht, T., & Wesche, M. (1999). Reading and ‘incidental’ L2 vocabulary acquisition: An introspective study of lexical referencing. Studies in Second Language Acquisition, 21(1), 195-224. https://doi.org/10.1017/S027226319900203X
Pilán, I., Alfter, D., & Volodina, E. (2016). Coursebook texts as a helping hand for classifying linguistic complexity in language learners’ writings. In Proceedings of the workshop on computational linguistics for linguistic complexity, (pp. 120-126). Osaka, Japan. https://www.aclweb.org/anthology/W16-4114.pdf
Pilán, I., Volodina, E., & Zesch, T (2016). Predicting proficiency levels in learner writings by transferring a linguistic complexity model from expert-written coursebooks. In Proceedings of the 26th international conference on computational linguistics: Technical papers (pp. 2101-2111). Osaka, Japan. https://www.aclweb.org/anthology/C16-1198.pdf
Pinker, S. (1994). The language instinct. William Morrow and Company.
Pinker, S. (1996). Language learnability and language development. Harvard University Press.
Pollard, C. J., & Sag, I. A. (1994). Head-driven phrase structure grammar. University of Chicago Press.
Poornamdariyan, T. (1994). The Persian lesson for foreign Persian learners (For beginners). Institute for Humanities and Cultural Studies Publications.
Postman, L., & Keppel, G. (1969). Verbal learning and memory. Penguin Books.
Prabhu, N. S. (1987). Second language pedagogy. Oxford University Press.
Robinson, P. (2001). Task complexity, cognitive load, and syllabus design. In P. Robinson, (Ed.), Cognition and second language instruction (pp. 287-318(. Cambridge University Press.
Saffar Moghaddam, A. (2003). General Persian: Basic constructions. Council of Extending Persian Language and Linguistics at the Institute for Humanities and Cultural Studies.
Saffar Moghaddam, A. (2008a). The Persian language (Vol. 1). Council of Extending Persian Language and Linguistics at the Institute for Humanities and Cultural Studies.
Saffar Moghaddam, A. (2008b). The Persian language (Vol. 2). Council of Extending Persian Language and Linguistics at the Institute for Humanities and Cultural Studies.
Saffar Moghaddam, A. (2008c). The Persian language. (Vol. 3). Council of Extending Persian Language and Linguistics at the Institute for Humanities and Cultural Studies.
Saffar Moghaddam, A. (2008d). The Persian language. (Vol. 4). Council of Extending Persian Language and Linguistics at the Institute for Humanities and Cultural Studies.
Saffran, J., Aslin, R., & Newport, E. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926-1928. https://doi.org/10.1126/science.274.5294.1926
Salton, G. M., Andrew W., & Chung-Shu Y. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620. https://doi.org/10.1145/361219.361220
Samareh, Y. (1989). Teaching the Persian language (Vol. 1). Alhoda International Publications.
Samareh, Y. (2005a). Teaching the Persian language (Vol. 2). Alhoda International Publications.
Samareh, Y. (2005b). Teaching the Persian language. (Vol. 3). Alhoda International Publications.
Samareh, Y. (2005c). Teaching the Persian language. (Vol. 4). Alhoda International Publications.
Sarvi, A., Talebnia, G., Pourzamani, Z., & Jahanshad, A. (2019). Assessment readability and understandability of accounting standards by accountants and auditors using Flesch and Cloze Indexes. Applied Research in Financial Reporting, 7(2), 241-274. http://www.arfr.ir/article_85308_8ee110e57414180e4fc5eec833f18000.pdf?lang=en
Shekari, A., & Najareyan, Z. (2012). A study of the readability of Hedyehaye Asemani textbooks for grades four and five based on Gunning scale. Research in Curriculum Planning, 9(6), 71-79. http://jsr-e.khuisf.ac.ir/article_534233_1f574dc5383e52c94da235658f255a2a.pdf
Skehan, P. (1998). A cognitive approach to language learning. Oxford University Press.
Sulistyowati, T. (2019). Bottom-up and top-down listening progress within cognitive constructivist learning theory. Prominent Journal of English Studies, 2(1), 92-100. https://doi.org/10.24176/pro.v2i1.2962
ten Bosch, L., Boves, L., Van Hamme, H., & Moore, R. K. (2009). A computational model of language acquisition: The emergence of words. Fundamenta Informaticae, 90(3), 229-249. https://doi.org/10.3233/FI-2009-0016
Thomas, M., & Reinders, H. (2010). Deconstructing tasks and technology. In M. Thomas & H. Reinders, (Eds.) Task-based language learning and teaching with technology (pp. 1-13). Continuum International Publishing Group.
Tomasello, M. (1992). First verbs: A case study of early grammatical development. Cambridge University Press.
Tomasello, M. (2000). The item-based nature of children’s early syntactic development. Early language development, 4(4), 156-163. https://doi.org/10.1016/S1364-6613(00)01462-5
Tomasello, M. (2006). Acquiring linguistic constructions. In D. Kuhn & R. Siegler (Eds.) Handbook of child psychology (pp. 255-298). Wiley. https://doi.org/10.1002/9780470147658.chpsy0206
Uchihara, T., Webb, S., & Yanagisawa, A. (2019). The effects of repetition on incidental vocabulary learning: A meta-analysis of correlational studies. Language learning, 69(3), 559-599. https://doi.org/10.1111/lang.12343
Vaezi, M. R., Kokabi, M., & Ebrahimi, M. (2016). Investigation of the readability level of authored fiction books, selected by Children's Book Council of Iran. Research on Information Science & Public Libraries, 21(4), 629-649. http://publij.ir/article-1-1085-fa.pdf
van Rijsbergen, C. J. (1979). Information retrieval, 2nd ed. Butterworth-Heinemann.
Widdowson, H. G. (1978). Teaching language as communication. Oxford University Press.
Yang, Y., Yu, W., & Lim, H. (2016). Predicting second language proficiency level using linguistic cognitive task and machine learning techniques. Wireless Pers Commun, 86, 271-285. https://doi.org/10.1007/s11277-015-3062-2
Zarea Gavgani V., Mirzadeh-Qasabeh, S., Hanaee, J., & Hamishehkar, H. (2018). Calculating reading ease score of patient package inserts in Iran. Drug Healthc Patient Safety, 19(10), 9-19. https://doi.org/10.2147/DHPS.S150428
Zarghamiyan, M. (1998). Series of teaching the Persian language (From Beginner to Advanced) (Vol. 1). Council of Extending Persian Language and Linguistics.
Zarghamiyan, M. (2001a). Series of teaching the Persian language (From Beginner to Advanced) (Vol. 2). Council of Extending Persian Language and Linguistics.
Zarghamiyan, M. (2001b). Series of teaching the Persian language (From Beginner to Advanced) (Vol. 3). Council of Extending Persian Language and Linguistics.
Zeinali, V., Haghparast, A., Damerchilou, M., & Vazifehshenas, N. (2019). Quality and readability of online health information produced by the Ministry of Health and Medical Education of Iran. Journal of Health Administration, 21(74), 65-74. http://jha.iums.ac.ir/article-1-2798-en.pdf

Journal of Language Horizons

کاربرد زبان‌شناسی رایانشی برای پیش‌بینی سطحِ مهارت زبانیِ کتاب‌های درسی زبان‌آموزانِ فارسی

مراجع

مراجع

دوره 6، شماره 1 - شماره پیاپی 11
خرداد 1401
صفحه 29-52

کاربرد زبان‌شناسی رایانشی برای پیش‌بینی سطحِ مهارت زبانیِ کتاب‌های درسی زبان‌آموزانِ فارسی

مراجع

مراجع

دوره 6، شماره 1 - شماره پیاپی 11خرداد 1401صفحه 29-52

دوره 6، شماره 1 - شماره پیاپی 11
خرداد 1401
صفحه 29-52