Subword Modeling


The Second Project

The goal of the second project is to generate the inflectional forms of verbs for Karbardian (kbd), Swahili (swc), and Mixtec (sty) from roots + inflectional information better than non-neural and neural baselines.

The data

You willl be provided with data from three languages:

For each language, you will be provided with train, dev, and test files. Files are formatted in the unimorph standard, with the exception of the testset where the correct forms are ommitted: [ROOT] ([CORRECT]) [INFLECTION]

The task

You are expected to produce correct inflected forms for each line in the testset. Please name your files as {lang}.txt - i.e. kbd.txt - for submission to the autograder. You submission will be a list of generated forms seperated by newlines for each language. The baseline scores to beat are as follows:

Language Non-Neural Accuracy Neural Accuracy
kbd 88.5 67.6
swc 72.3 0.95
xty 73.5 79.8

Code to reproduce the baselines can be found here.