Datasets

0
35

French-Nande parallel text corpus. French sentences are sourced from Tatoeba repository.
No. of sentences 5000

Levantine Arabic text data from Levantine arabic posts shared on the Khabrona.Info Facebook page. 5052 parallel sentences with English translations, 658 monolingual sentences

Swahili short audio samples recorded by a male speaker.

Format: WAV
No. of samples: 4700

English-Hausa parallel text corpus. English sentences are sourced from Tatoeba repository.
No. of sentences 5000

English-Rohingya parallel text corpus. English sentences are sourced from Tatoeba repository.
No. of sentences 5000

English-Swahili parallel text corpus. English sentences are sourced from Tatoeba repository.
No. of sentences 5000

English-Kanuri parallel text corpus. English sentences are sourced from Tatoeba repository.
No. of sentences 5000