نوع مقاله : مقاله پژوهشی

نویسنده

استادیار، پژوهشکده زبان‌شناسی، پژوهشگاه علوم انسانی و مطالعات فرهنگی، تهران، ایران

چکیده

یکی از زیرحوزه‌های ارزیابی مهارت زبانی، پیش‌بینی سطح مهارت زبانی است. این پژوهش به‌دنبال ارائۀ یک مدل زبان‌شناسی رایانشی برای پیش‌بینی سطح مهارت زبانی و کشف ویژگی‌های کلی این سطوح است. برای این هدف، نگارنده یک پیکرۀ زبانی از کتاب‌های درسی زبان‌آموزان فارسی گردآوری شده و ویژگی‌های زبان‌شناختی و آماری از این پیکرۀ متنی  برای آموزش سه دسته‌بند به‌‌عنوان یادگیرنده (زبان‌آموز) استخراج شده‌است. عملکرد مدل‌ها‌ بر مبنای الگوریتم یادگیری و مجموعۀ ویژگی‌هایی که برای آموزش مدل‌ها به ‌کار رفته‌است متفاوت است. برای ارزیابی مدل‌ها، چهار معیار اندازه‌گیری استاندارد به نام درستی، دقت، فراخوانی و معیار اف استفاده شده‌است. بر اساس یافته‌ها، هنگامی‌که ویژگی‌های آماریِ استخراج شده از متن خام، به کار گرفته شد، مدل ساخته‌شده توسط دسته‌بند جنگل تصادفی بهترین عملکرد مدل را به‌دست آورد. دسته‌بند ماشین بردار پشتیبان با استفاده از ویژگی‌های زبان‌شناختی استخراج‌شده از پیکرة نشانه‌گذاری‌شده به ‌صورت خودکار، بهترین عملکرد مدل را به‌دست آورد.  نتایج به‌دست‌آمده مشخص می‌کند که غنی‌سازی مدل و فراهم‌آوری اطلاعات مختلف، تضمین‌ نمی‌کند که یک دسته‌بند (زبان‌آموز) بهترین عملکرد مدل را به‌دست آورد. برای کشف روش‌شناختی آموزشی پنهان  در کتاب‌های درسی، عملکرد کلی دسته‌بندها با‌توجه ‌به سطح زبانی و دانش زبان‌شناختی به‌کاررفته برای ساخت مدل بررسی شد. براساس یافته‌های به‌دست‌آمده، تعداد ویژگی‌های استخراج‌شده نقش به‌سزایی در آموزش یک دسته‌بند دارد. همچنین، بهترین عملکرد میانگین دسته‌بند‌ها از الگوهای دستوری در سطح مهارت زبانی الف (مبتدی)، به همة دانش زبان‌شناسی در سطوح ب (متوسط) و ج (پیشرفته) گسترش می‌یابد.

کلیدواژه‌ها

  1. Ahmadzadeh, K., Khosravi, A., Arastoopoor, S., & Tahmasebi, R. (2014). Assessing the readability of patient education materials about diabetes available in Shiraz Health Centers. Iranian Journal of Medical Education, 14(8), 661-667. http://ijme.mui.ac.ir/article-1-3157-en.pdf
  2. Aslin, R., Saffran, J., & Newport, E. (1998). Computation of conditional probability statistics by 8-month old infants. Psychological Science, 9, 321-324. https://doi.org/10.1111/1467-9280.00063
  3. Belkhir, S. (2020). Cognition and language learning: An introduction. In S. Belkhir (Ed.), Cognition and language learning (pp. 1-12). Cambridge Scholars Publishing.
  4. Bijankhan, M. (2004). The role of corpora in writing a grammar: Introducing a software. Journal of Linguistics, 19(2), 48-67.
  5. Bohnet, (2009). Efficient parsing of syntactic and semantic dependency structures. In Proceedings of the 13th conference on computational natural language learning: Shared task (pp. 67-72). Association for Computational Linguistics. https://www.aclweb.org/anthology/W09-1210.pdf
  6. Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the 5th annual workshop on computational learning theory (pp. 144-152). https://doi.org/10.1145/130385.130401
  7. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
  8. Brumfit, C., & Johnson, K. (1979). The communicative approach to language teaching, Oxford University Press.
  9. Bush, M., & Terry, R. (1997). Technology-enhanced language learning, National Textbook Company.
  10. Chomsky, N. (1965). Aspects of the Theory of Syntax. The MIT Press.
  11. Chomsky, N. (1968). Language and Mind. Harcourt Brace Jovanovich.
  12. Chomsky, N. (1975). Reflections on language. Pantheon Books.
  13. Chomsky, N. (1980). Rules and representations. Behavioral and Brain Sciences, 3, 1-61.    https://doi.org/10.1017/S0140525X00001515
  14. Cramer, J. S. (2002). The origins of logistic regression. Technical Report (pp. 167-178). Tinbergen Institute. https://papers.tinbergen.nl/02119.pdf
  15. Crossley, S. A., Salsbury, T., & McNamara, D. S. (2011). Predicting the proficiency level of language learners using lexical indices. Language Testing, 29(2), 240-260. https://doi.org/10.1177/0265532211419331
  16. de Wet, F., Van Der Walt, C., & Niesler, T. R. (2009). Automatic assessment of oral language proficiency and listening comprehension. Speech Communication, 52, 864-874. https://doi.org/10.1016/j.specom.2009.03.002
  17. Djigunović, J. M., & Krajnović, M. M. (2005). Language teaching methodology and second language acquisition. In V. Muhvic-Dimanovski & L. Socanac (Eds.), Encyclopedia of life support systems, (pp. 394-417). Eolss Publishers Co. Ltd.
  18. Doró, K. (2011). English language proficiency and the prediction of academic success of first-year students of English. UPRT 2010: Empirical studies in English applied linguistics (pp. 173-186). Lingua Franca Csoport.          http://publicatio.bibl.uszeged.hu/11049/1/Doro%202011%20Language%20proficiency%20and%20academic%20success.pdf
  19. Ebbinghaus, H. (1964). Memory: A contribution to experimental psychology. (H. A. Ruger & C. E. Bussenius, Trans.). Dover Publications. (Original work published 1885). https://doi.org/10.5214/ans.0972.7531.200408
  20. Elliott, S. N., Kratochwill, T. R., Littlefield, C., J., & Travers, J. (2000). Educational psychology: Effective teaching, effective learning (3rd Ed.). McGraw-Hill College.
  21. Ellis, R. (1997). Second language acquisition. Oxford University Press.
  22. Eslami, M., Mosavi Atashgah, M., Alizadeh Lamjiri, S., & Zandi, T. (2004). Persian productive lexicon. In Proceedings of the 1st workshop on the Persian language and computer, University of Tehran.
  23. Evans, V. (2014). The language myth: Why language is not an instinct. Cambridge University Press.
  24. Ghaderi Moghaddam, M. E., & Sobhaninejad, M. (2016). Validation methods to measure textbooks readability. Research in Curriculum Planning, 13(21), 44-55.
  25. Ghaffari, M., Mahmoodi Bakhtiyari, B., & Zolfaghari, H. (2004). Let’s learn Persian (Volumes 1-3). Madreseh Publication. https://jsr-e.khuisf.ac.ir/article_534415_65a3945c9994bc90c81c23ab0eacfaf7.pdf?lang=en
  26. Ghayoomi, M. (2012). Bootstrapping the development of an HPSG-based treebank for Persian. Linguistic Issues in Language Technology, 7(1).
  27. Ghayoomi, M. (2013). Word clustering for Persian statistical parsing. In H. Isahara, & K. Kanzaki, (Eds.), Advances in natural language processing, (pp. 126-137). Springer. https://doi.org/10.1007/978-3-642-33983-7_13
  28. Ghayoomi, M. (2018). The problem of multi-words in syntactic processing of Persian. In Proceedings of the fourth nation conference on computational linguistics (pp. 11-40). Neviseh Parsi Publications.
  29. Ghayoomi, M. (2019a). Studying issues for automatic processing of the Persian language on lemmatization, part-of-speech tagging, and parsing: Developing a software using machine learning methods. Technical Report. Tehran, Iran.
  30. Ghayoomi, M. (2019b). Transition from rule-based to statistical lemmatization in Persian. In Proceedings of the 5th national conference on computational linguistics (pp. 57-86). Neveeseh Parsi Publications.
  31. Ghayoomi, M. (2020). Structuring multilayer linguistic analyses in linguistic corpora. In F. Ghatreh & Sh. Modarres Khiabani, (Eds.), Word by word of life: Festschrift for professor Vida Shaghaghi (pp. 287-312). Neveeseh Parsi Publications.
  32. Ghayoomi, M., & Kuhn, J. (2014). Converting an HPSG-based treebank into its parallel dependency-based Treebank. In Proceedings of the 9th international conference on language resources and evaluation (pp. 802-809). Reykjavik, Iceland. http://www.lrec-conf.org/proceedings/lrec2014/pdf/441_Paper.pdf
  33. Gomez, R., & Gerken, L. (1999). Artificial grammar learning by one-year-olds leads to specific and abstract knowledge. Cognition, 70, 109-135. https://doi.org/10.1016/S0010-0277(99)00003-7
  34. Indurkhya, N., & Damerau, F. J. (2010). Handbook of natural language processing. Chapman & Hall.
  35. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R. Springer.
  36. Jung, Y. J., Crossley, S., & McNamara, D. (2019). Predicting second language writing proficiency in learner texts using computational tools. Journal of Asia TEFL, 16(1), 37-52.https://doi.org/10.18823/asiatefl.2019.16.1.3.37
  37. Khademizadeh, S., & Vaezi, M. R. (2020). Evaluating readability of Persian fictions selected by flying Turtle the Iranian award. Publishing Research Quarterly, 36, 116-128. https://doi.org/10.1007/s12109-019-09705-0
  38. Khodadady, E., & Mehrazmay, R. (2017). Evaluating two high intermediate EFL and ESL textbooks: A comparative study based on readability indices. Sociology International Journal, 1(3), 93-102. https://doi.org/10.15406/SIJ.2017.01.00016
  39. Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st meeting of the association for computational linguistics (pp. 423-430). https://doi.org/10.3115/1075096.1075150
  40. Levy, M., & Stockwell, G. (2006). CALL dimensions: Options and issues in computer assisted language learning. Lawrence Erlbaum Associates.
  41. Luo, D., Minematsu, N., Yamauchi, Y., & Hirose, K. (2008). Automatic assessment of language proficiency through shadowing. In Proceedings of 6th international symposium on Chinese spoken language processing (pp. 41-44). https://doi.org/10.1109/CHINSL.2008.ECP.22
  42. MacWhinney, B. (1999). The emergence of language. Lawrence Erlbaum Associates.
  43. MacWhinney, B. (2005). Item-based constructions and the logical problem. In Proceedings of the workshop on psychocomputational models of human language acquisition (pp. 53-68). Ann Arbor, Michigan. https://doi.org/10.3115/1654524.1654531
  44. MacWhinney, B. (2010). Computational models of child language learning: An introduction. Journal of Child Language, 37(3), 477-485. https://doi.org/10.1017/S0305000910000139
  45. Maftoon, P., & Daghigh, M. (2001). Metric of determining readability of translated texts from English into Persian. Humanities Bulletin, 29, 61-80. https://www.sid.ir/fa/journal/ViewPaper.aspx?id=27487
  46. Marty, F. (1981). Reflections on the use of computers in second language acquisition. System, 9(2), 85-98. https://eric.ed.gov/?id=ED218932
  47. Matlin, M. W. (2005). Cognition. John Wiley and Sons.
  48. Matusevych, Y., Alishahi, A., & Backus, A. (2013). Computational simulations of second language construction learning. In Proceedings of the workshop on cognitive modeling and computational linguistics (pp. 47-56). Sofia, Bulgaria. Association for Computational Linguistics.  https://www.aclweb.org/anthology/W13-2606.pdf
  49. McLean, S., Stewart, J., & Batty, A. O. (2020). Predicting L2 reading proficiency with modalities of vocabulary knowledge: A bootstrapping approach. Language Testing, 37(3), 389-411. https://doi.org/10.1177/0265532219898380
  50. Mohammadi, H., & Khasteh, S. H. (2020). A machine learning approach to Persian text readability assessment using a crowd-sourced dataset. In Proceedings of the 28th Iranian conference on electrical engineering, University of Tabriz. https://doi.org/10.1109/ICEE50131.2020.9260933
  51. Monaghan, P., Chang, Y. N., Welbourne, S., & Brysbaert, M. (2017). Exploring the relations between word frequency, language exposure, and bilingualism in a computational model of reading. Journal of Memory and Language, 93, 1-21. https://doi.org/10.1016/j.jml.2016.08.003
  52. Müller, T., Cotterell, R., Fraser, A., & Schütze, H. (2015). Joint lemmatization and morphological tagging with lemming. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 2268-2274). Lisbon, Portugal. Association for Computational Linguistics. https://www.aclweb.org/anthology/D15-1272.pdf
  53. Müller, T., Schmid, H., & Schütze, H. (2013). Efficient higher-order CRFs for morphological tagging. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 322-332). Seattle, Washington, USA. Association for Computational Linguistics. https://www.aclweb.org/anthology/D13-1032.pdf
  54. Nazari, F., Farhadpour, M. R., & Soleymani, E. (2016). Measure the readability of the Persian text of the ‘Lets know More’ section of the Quran book for the grades two, three, and four of elementary school based on the Flash-Diani and Galing-Diani formulas. Quarterly Journal of Knowledge and Information Management, 3(3), 85-92. http://lib.journals.pnu.ac.ir/article_4415_f2b05f84f03592edc72327a8a72ec55b.pdf?lang=en
  55. Newport, E., & Aslin, R. (2000). Innately constrained learning: Blending old and new approaches to language acquisition. In S. Howell, S. Fish, & T. Keith-Lucas, (Eds.), Proceedings of the 24th annual Boston University conference on language development, Somerville, MA. Cascadilla Press.
  56. Paribakht, T., & Wesche, M. (1999). Reading and ‘incidental’ L2 vocabulary acquisition: An introspective study of lexical referencing. Studies in Second Language Acquisition, 21(1), 195-224. https://doi.org/10.1017/S027226319900203X
  57. Pilán, I., Alfter, D., & Volodina, E. (2016). Coursebook texts as a helping hand for classifying linguistic complexity in language learners’ writings. In Proceedings of the workshop on computational linguistics for linguistic complexity, (pp. 120-126). Osaka, Japan. https://www.aclweb.org/anthology/W16-4114.pdf
  58. Pilán, I., Volodina, E., & Zesch, T (2016). Predicting proficiency levels in learner writings by transferring a linguistic complexity model from expert-written coursebooks. In Proceedings of the 26th international conference on computational linguistics: Technical papers (pp. 2101-2111). Osaka, Japan. https://www.aclweb.org/anthology/C16-1198.pdf
  59. Pinker, S. (1994). The language instinct. William Morrow and Company.
  60. Pinker, S. (1996). Language learnability and language development. Harvard University Press.
  61. Pollard, C. J., & Sag, I. A. (1994). Head-driven phrase structure grammar. University of Chicago Press.
  62. Poornamdariyan, T. (1994). The Persian lesson for foreign Persian learners (For beginners). Institute for Humanities and Cultural Studies Publications.
  63. Postman, L., & Keppel, G. (1969). Verbal learning and memory. Penguin Books.
  64. Prabhu, N. S. (1987). Second language pedagogy. Oxford University Press.
  65. Robinson, P. (2001). Task complexity, cognitive load, and syllabus design. In P. Robinson, (Ed.), Cognition and second language instruction (pp. 287-318(. Cambridge University Press.
  66. Saffar Moghaddam, A. (2003). General Persian: Basic constructions. Council of Extending Persian Language and Linguistics at the Institute for Humanities and Cultural Studies.
  67. Saffar Moghaddam, A. (2008a). The Persian language (Vol. 1). Council of Extending Persian Language and Linguistics at the Institute for Humanities and Cultural Studies.
  68. Saffar Moghaddam, A. (2008b). The Persian language (Vol. 2). Council of Extending Persian Language and Linguistics at the Institute for Humanities and Cultural Studies.
  69. Saffar Moghaddam, A. (2008c). The Persian language. (Vol. 3). Council of Extending Persian Language and Linguistics at the Institute for Humanities and Cultural Studies.
  70. Saffar Moghaddam, A. (2008d). The Persian language. (Vol. 4). Council of Extending Persian Language and Linguistics at the Institute for Humanities and Cultural Studies.
  71. Saffran, J., Aslin, R., & Newport, E. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926-1928. https://doi.org/10.1126/science.274.5294.1926
  72. Salton, G. M., Andrew W., & Chung-Shu Y. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620. https://doi.org/10.1145/361219.361220
  73. Samareh, Y. (1989). Teaching the Persian language (Vol. 1). Alhoda International Publications.
  74. Samareh, Y. (2005a). Teaching the Persian language (Vol. 2). Alhoda International Publications.
  75. Samareh, Y. (2005b). Teaching the Persian language. (Vol. 3). Alhoda International Publications.
  76. Samareh, Y. (2005c). Teaching the Persian language. (Vol. 4). Alhoda International Publications.
  77. Sarvi, A., Talebnia, G., Pourzamani, Z., & Jahanshad, A. (2019). Assessment readability and understandability of accounting standards by accountants and auditors using Flesch and Cloze Indexes. Applied Research in Financial Reporting, 7(2), 241-274. http://www.arfr.ir/article_85308_8ee110e57414180e4fc5eec833f18000.pdf?lang=en
  78. Shekari, A., & Najareyan, Z. (2012). A study of the readability of Hedyehaye Asemani textbooks for grades four and five based on Gunning scale. Research in Curriculum Planning, 9(6), 71-79. http://jsr-e.khuisf.ac.ir/article_534233_1f574dc5383e52c94da235658f255a2a.pdf
  79. Skehan, P. (1998). A cognitive approach to language learning. Oxford University Press.
  80. Sulistyowati, T. (2019). Bottom-up and top-down listening progress within cognitive constructivist learning theory. Prominent Journal of English Studies, 2(1), 92-100. https://doi.org/10.24176/pro.v2i1.2962
  81. ten Bosch, L., Boves, L., Van Hamme, H., & Moore, R. K. (2009). A computational model of language acquisition: The emergence of words. Fundamenta Informaticae, 90(3), 229-249. https://doi.org/10.3233/FI-2009-0016
  82. Thomas, M., & Reinders, H. (2010). Deconstructing tasks and technology. In M. Thomas & H. Reinders, (Eds.) Task-based language learning and teaching with technology (pp. 1-13). Continuum International Publishing Group.
  83. Tomasello, M. (1992). First verbs: A case study of early grammatical development. Cambridge University Press.
  84. Tomasello, M. (2000). The item-based nature of children’s early syntactic development. Early language development, 4(4), 156-163. https://doi.org/10.1016/S1364-6613(00)01462-5
  85. Tomasello, M. (2006). Acquiring linguistic constructions. In D. Kuhn & R. Siegler (Eds.) Handbook of child psychology (pp. 255-298). Wiley. https://doi.org/10.1002/9780470147658.chpsy0206
  86. Uchihara, T., Webb, S., & Yanagisawa, A. (2019). The effects of repetition on incidental vocabulary learning: A meta-analysis of correlational studies. Language learning, 69(3), 559-599. https://doi.org/10.1111/lang.12343
  87. Vaezi, M. R., Kokabi, M., & Ebrahimi, M. (2016). Investigation of the readability level of authored fiction books, selected by Children's Book Council of Iran. Research on Information Science & Public Libraries, 21(4), 629-649. http://publij.ir/article-1-1085-fa.pdf
  88. van Rijsbergen, C. J. (1979). Information retrieval, 2nd ed. Butterworth-Heinemann.
  89. Widdowson, H. G. (1978). Teaching language as communication. Oxford University Press.
  90. Yang, Y., Yu, W., & Lim, H. (2016). Predicting second language proficiency level using linguistic cognitive task and machine learning techniques. Wireless Pers Commun, 86, 271-285. https://doi.org/10.1007/s11277-015-3062-2
  91. Zarea Gavgani V., Mirzadeh-Qasabeh, S., Hanaee, J., & Hamishehkar, H. (2018). Calculating reading ease score of patient package inserts in Iran. Drug Healthc Patient Safety, 19(10), 9-19. https://doi.org/10.2147/DHPS.S150428
  92. Zarghamiyan, M. (1998). Series of teaching the Persian language (From Beginner to Advanced) (Vol. 1). Council of Extending Persian Language and Linguistics.
  93. Zarghamiyan, M. (2001a). Series of teaching the Persian language (From Beginner to Advanced) (Vol. 2). Council of Extending Persian Language and Linguistics.
  94. Zarghamiyan, M. (2001b). Series of teaching the Persian language (From Beginner to Advanced) (Vol. 3). Council of Extending Persian Language and Linguistics.
  95. Zeinali, V., Haghparast, A., Damerchilou, M., & Vazifehshenas, N. (2019). Quality and readability of online health information produced by the Ministry of Health and Medical Education of Iran. Journal of Health Administration, 21(74), 65-74. http://jha.iums.ac.ir/article-1-2798-en.pdf