Performance evaluation of recurrent neural network on large-scale translated dataset for question generation in NLP for educational purposes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In recent years, neural networks have been used widely to solve many NLP tasks that involve large-scale datasets. Recently, Question Generation (QG) has called great attention since it is a subtask of Question Answering (QA) that has many applications in the real world, mainly for educational purposes. The importance of it could be seen on many recently released large-scale datasets prepared exclusively for this task, most the data used in NLP are available in the English language, but it is not the case for the rest of the languages, like Spanish, which is the third most used language in the world. This research is focused on analyzing the performance of current state-of-the-art neural network models used in QG using translated Spanish large-scale dataset from English. To know the accuracy of the translated Spanish data from English, it has been used state-of-the-art OpenNMT machine translator and Google Translation API, then the results have been analyzed with the corresponding automatic metrics - BLEU, METEOR, ROUGE - and human evaluations such as fluency and adequacy, later, it has been trained a state-of-the-art question generation (QG) neural network model using Spanish translated data to generate automatic questions in Spanish language. Surprisingly, the results outperform the original English results in average 37% on all automatic evaluation metrics. To the best of our knowledge, this work is the first one using large-scale Spanish translated data for QG task using recurrent neural networks for educational purposes.

Original languageEnglish
Title of host publication17th LACCEI International Multi-Conference for Engineering, Education, and Technology
Subtitle of host publication"Industry, Innovation, and Infrastructure for Sustainable Cities and Communities", LACCEI 2019
PublisherLatin American and Caribbean Consortium of Engineering Institutions
ISBN (Electronic)9780999344361
DOIs
StatePublished - 2019
Event17th LACCEI International Multi-Conference for Engineering, Education, and Technology, LACCEI 2019 - Montego Bay, Jamaica
Duration: 24 Jul 201926 Jul 2019

Publication series

NameProceedings of the LACCEI international Multi-conference for Engineering, Education and Technology
Volume2019-July
ISSN (Electronic)2414-6390

Conference

Conference17th LACCEI International Multi-Conference for Engineering, Education, and Technology, LACCEI 2019
Country/TerritoryJamaica
CityMontego Bay
Period24/07/1926/07/19

Bibliographical note

Funding Information:
This research is part of the project “AQNLP: Software de Apoyo a la Comprensión Lectora de Estudiantes en Etapa Escolar, Mediante la Generación de Preguntas Automáticas desde Libros de Texto Utilizando Tecnología NLP”, under a grant contract N° TP-053-2018-UNSA, which is supported by Universidad Nacional de San Agustín de Arequipa. It is also a great opportunity to thank CiTeSoft-UNSA for the great support and the great environment provided during this research.

Publisher Copyright:
© 2019 Latin American and Caribbean Consortium of Engineering Institutions. All rights reserved.

Keywords

  • Google translation
  • Natural language processing
  • Recurrent neural network (RNN)
  • Squad dataset
  • Translated data

Fingerprint

Dive into the research topics of 'Performance evaluation of recurrent neural network on large-scale translated dataset for question generation in NLP for educational purposes'. Together they form a unique fingerprint.

Cite this