The Tunisian Arabic Corpus, Tunisiya was begun by Karen McNeil and Miled Faiza in 2010. At that time, there were very few resources available for Tunisian Arabic (or North African Arabic in general), and it was the first publicly-available corpus of Tunisian Arabic.
In the beginning, written Tunisian Arabic was rare and difficult to find. The first texts included in the corpus were the collected folktales of Abdelaziz El-Aroui (found at a library in Tunisia), the printed texts of some plays (acquired from various bookstores in the capital), and the screenplay of the 2007 Ramadan miniseries Sayd er-rim (from a friend-of-a-friend involved in the production). In addition, we scraped some online materials, like blogs and forum posts, which were a very new format for written Tunisian Arabic.
Since 2011, it has become significantly easier to find resources in Tunisian Arabic, since the availability of print materials in derja has exploded since the revolution. (This phenomenon was the subject of Karen's dissertation.) There are now entire novels and short story collections written in Tunisian Arabic, and many of them have been added to the corpus.
The corpus has been used by scholars all over the world, and has provided data for books, articles, masters' thesis, dissertations, and computational linguistic projects. Please drop us a line and let us know what you are using the corpus for — we're always happy to hear what's working for people, and also additional features or fixes they would like to see.
This corpus is a labor of love, self-funded and worked on in our spare time (of which we don't have much!). So our apologies for any shortcomings you may find.
Karen McNeil is Director of LLM Practice and Red Teaming at the AI data engineering firm, Innodata. She is the creator and lead programmer for Tunisiya (and the one to blame whenever it's broken). She is also an Arabic–English translator: her most recent translated novel (with Miled Faiza) is Amira Ghenim's A Calamity of Noble Houses. Karen and Miled also translated Shukri Mabkhout’s The Italian, as well as poems and short stories for Banipal and World Literature Today, and children's books for Kalimat. She was lead revising editor of the Oxford Arabic Dictionary (2014) and has done numerous Arabic consulting projects.
Karen graduated cum laude from Wellesley College in 2000 with a BA in History (with Honors) and Spanish. She received a Master of Arts in Arabic Language, Literature, and Linguistics from Georgetown University in 2012. In 2023, she completed a Ph.D. in Arabic linguistics at Georgetown University; her dissertation was about the development of Tunisian Arabic as a written language. Articles based on her dissertation research have appeared in International Journal of the Sociology of Language, Journal of Arabic Sociolinguistics, and Perspectives on Arabic Linguistics.
Miled Faiza is a Tunisian poet and translator living in Providence, RI. He is the author of Baqāya l-bayt allaḏī daḵalnāhu marratan wāḥida (2004) and Asabaʕ an-naḥḥāt (2019) and translator of the Booker Prize–shortlisted novel Autumn (al-Kharif, 2017), as well as Winter (al-Shitā’, 2019) and Spring (ar-Rabiʕ, 2023), all by Ali Smith. He also translated A Calamity of Noble Houses (2025) and The Italian (2021), both with Karen McNeil. He teaches Arabic at Brown University.