A large-scale multilingual speech dataset for 7 South African languages supporting ASR research and inclusive technologies.
Vukuzenzele Newspaper [Website][Data Repo], Wikipedia, African Wordnet, GrainSA, Agricultural Research Council, SADiLaR, Masakhane
Listen to short clips from the dataset
Total Hours
3,016
Total Clips
483,191
Speakers
2,335
Languages
7
Citation: arXiv:2512.02201 [cs.CL] - Swivuriso: The South African Next Voices Multilingual Speech Dataset