Resources

0
2381

Technical Documents

Language Specific Peculiarities Document for Sheng as Spoken in Kenya

Speech-to-text models

Congolese Swahili speech-to-text model

DeepSpeech models for Congolese Swahili language.

Bengali speech-to-text model

DeepSpeech models for Bengali language.

Hausa baseline speech-to-text model

Speech corpora

Congolese Swahili speech corpora

Audio mini-kit

5000 Audio samples recorded by 5 speakers. Sentences from Congolese Swahili mini kit. Format: WAV
Size: 11 hours

Congolese Swahili TICO-19 audio datasets

TICO-19 Congolese Swahili development and test sets recorded by a male and a female speaker.

Audio commands corpus

An audio corpus that consists of 5 speakers uttering numbers 1 to 10 and yes/no in Congolese Swahili.

Coastal Swahili speech corpus

Audio samples recorded by a Kenyan male speaker. Sentences from Swahili mini kit. Format: WAV
No. of samples: 4700

Parallel text corpora

Tigrinya – English parallel text corpora

English sentences are sourced from Tatoeba repository and then translated into Tigrinya.
No. of sentences 5000

Lingala – French parallel text corpora

French sentences are sourced from Tatoeba repository and then translated into Lingala.
No. of sentences 5000

Congolese Swahili – French parallel text corpora

French sentences are sourced from Tatoeba repository and then translated into Congolese Swahili.
No. of sentences 25305

Synthetically produced Swahili-French parallel text corpora

English-paired and monolingual data converted to Swahili-French parallel corpus using machine translation.

No. of sentences 928,065

French – Nande parallel text corpora

French sentences are sourced from Tatoeba repository and then translated into Nande.
No. of sentences 15000

Colloquial Levantine Arabic parallel corpus

Posts shared on the Khabrona.Info Facebook page. 5052 parallel sentences with English translations, 658 monolingual sentences

English – Hausa parallel text corpora

French sentences are sourced from Tatoeba repository and then translated into Hausa.
No. of sentences 15000

English – Rohingya parallel text corpus

English sentences are sourced from Tatoeba repository and then translated into Rohingya.
No. of sentences 5000

English – Swahili parallel text corpus

French sentences are sourced from Tatoeba repository and then translated into Swahili.
No. of sentences 5000

English – Kanuri parallel text corpus

French sentences are sourced from Tatoeba repository and then translated into Kanuri.
No. of sentences 5000

Plain text corpora

Kanuri books corpus

Shuffled sentences from books collected from four Kanuri authors

Machine translation models

Congolese Swahili – French OpenNMT checkpoints

Hausa – English OpenNMT checkpoints