K-QuAD

What is?

Recently, various datasets for question answering (QA) research have been released, such as SQuAD, Marco, WikiQA, MCTest, and SearchQA. However, such existing training resources for these task mostly support only English. In contrast, we study semi-autmoated creation of the Korean Question Answering Dataset (K-QuAD), by using automatically translated SQuAD, guided by a QA system bootstrapped on a small QA pair set. A naive approach of training a QA system, using only machine-translated SQuAD show limited performance due to translation errors. So we annotate seed QA pairs of small size (4k) for Korean language. So K-QuAD contains 77K(Translated) + 4K(Seed).

Leader Board

Rank	Model	F1	EM