What is?
Recently, various datasets for question answering (QA) research have been released, such as SQuAD, Marco, WikiQA, MCTest, and
SearchQA. However, such existing training resources for these task mostly support only English. In contrast, we study semi-autmoated
creation of the Korean Question Answering Dataset (K-QuAD), by using automatically translated SQuAD, guided by a QA system
bootstrapped on a small QA pair set. A naive approach of training a QA system, using only machine-translated SQuAD show limited
performance due to translation errors. So we annotate seed QA pairs of small size (4k) for Korean language. So K-QuAD contains
77K(Translated) + 4K(Seed).
Leader Board
Rank | Model | F1 | EM |
---|