The goal of the second project is to generate the inflectional forms of verbs for Karbardian (kbd), Swahili (swc), and Mixtec (sty) from roots + inflectional information better than non-neural and neural baselines.
You willl be provided with data from three languages:
For each language, you will be provided with train, dev, and test files. Files are formatted in the unimorph standard, with the exception of the testset where the correct forms are ommitted:
[ROOT] ([CORRECT]) [INFLECTION]
You are expected to produce correct inflected forms for each line in the testset. Please name your files as {lang}.txt
- i.e. kbd.txt
- for submission to the autograder. You submission will be a list of generated forms seperated by newlines for each language.
The baseline scores to beat are as follows:
Language | Non-Neural Accuracy | Neural Accuracy |
---|---|---|
kbd | 88.5 | 67.6 |
swc | 72.3 | 0.95 |
xty | 73.5 | 79.8 |
Code to reproduce the baselines can be found here.