- Download 9
- File Size 318.70 MB
- File Count 1
- Create Date August 23, 2021
- Last Updated September 7, 2021
Congolese Swahili speech-to-text model
Speech-to-Text models for Congolese dialect of Swahili Language. Contains acoustic and language models to be used with deepspeech based ASR.
Total train size: 8.93 (mini-kit) + 3.27 (TICO-19 testset) = 12.2 hours
Dev size: 0.49 hours (mini-kit)
Test size: 1.71 hours (TICO-19 devset)
Contains two language models (scorers):
- General purpose language model (swc-general.scorer) is trained on a 37.7M word mixed Swahili text corpus
- Commands language model (swc-commands.scorer) is trained on 12 commands (numbers from 1 to 10 and yes/no) which are listed in `vocab-commands.txt`.
- 81,69% on subset of TICO-19 devset (file list in swc-tico-test.csv) using general purpose scorer
- 78,92% on Congolese Swahili audio commands corpus using commands scorer
Developer: Alp Öktem
Disclaimer: This model is not tested in production and is provided as-is without any warranty.