Using the Corpus

There are three ways to search the corpus: Exact, Stem, and Regex. Please not that Exact and Stem searches must be written in Arabic characters, i.e. عالسلامة, not 3asalaama or Calslamp RegEx searches, however, must be entered with the transliteration system given below.

Exact

This will search for the word exactly as you type it. For example: typing يمشي will return all instances of يمشي, but not تمشي or مشيت.

Stem

This will search for the stem of the word, without any inflections. For example: typing مشي will return يمشي، مشيت, etc. and دار will return الدار، دارها, etc.

RegEx

Searching with regular expressions allows you to perform sophisticated wildcard searches. To perform a regex search, you must enter your search in transliteration, rather than Arabic characters, following the transliteration system given below. Do not use the word boundary character (\b) in your search, however -- these are automatically added to the searchstring. A good primer on regex syntax can be found at the RegEx page on Wikipedia. Some examples:

  • [sS]Hb  will return صحب or سحب
  • krhb[pht](\w)*  will return كرهبة , كرهبه , كرهبتي , etc.
  • (ma)?\w{3,6}J   will return مشيتش , يحبوش , مافيباليش , and most other negative verbs and psuedo-verbs. (Currently there's no way to search for word phrases like ما عرفش.)

Transliteration System:

Modified version of the Buckwalter system.
cءhamza-on-the-line
Aآmadda
eأhamza-on-'alif
Wؤhamza-on-waaw
Iإhamza-under-'alif
iئhamza-on-yaa'
aاbare 'alif
bبbaa'
pةtaa' marbuuTa
tتtaa'
vثthaa'
jجjiim
HحHaa'
xخkhaa'
dدdaal
Vذdhaal
rرraa'
zزzaay
sسsiin
Jشshiin
SصSaad
DضDaad
TطTaa'
ZظZaa' (DHaa')
Cعcayn
Gغghayn
fفfaa'
qقqaaf
gڤgaaf
kكkaaf
lلlaam
mمmiim
nنnuun
hهhaa'
wوwaaw
Eى'alif maqSuura
yيyaa'
_ّshaddah