Using the Corpus
There are three ways to search the corpus: Exact, Stem, and Regex. Please not that Exact and Stem searches must be written in Arabic characters, i.e. عالسلامة, not 3asalaama or Calslamp RegEx searches, however, must be entered with the transliteration system given below.
Exact
This will search for the word exactly as you type it. For example: typing يمشي will return all instances of يمشي, but not تمشي or مشيت.
Stem
This will search for the stem of the word, without any inflections. For example: typing مشي will return يمشي، مشيت, etc. and دار will return الدار، دارها, etc.
RegEx
Searching with regular expressions allows you to perform sophisticated wildcard searches. To perform a regex search, you must enter your search in transliteration, rather than Arabic characters, following the transliteration system given below. Do not use the word boundary character (\b) in your search, however -- these are automatically added to the searchstring. A good primer on regex syntax can be found at the RegEx page on Wikipedia. Some examples:
- [sS]Hb will return صحب or سحب
- krhb[pht](\w)* will return كرهبة , كرهبه , كرهبتي , etc.
- (ma)?\w{3,6}J will return مشيتش , يحبوش , مافيباليش , and most other negative verbs and psuedo-verbs. (Currently there's no way to search for word phrases like ما عرفش.)
Transliteration System:
Modified version of the Buckwalter system.c | ء | hamza-on-the-line |
A | آ | madda |
e | أ | hamza-on-'alif |
W | ؤ | hamza-on-waaw |
I | إ | hamza-under-'alif |
i | ئ | hamza-on-yaa' |
a | ا | bare 'alif |
b | ب | baa' |
p | ة | taa' marbuuTa |
t | ت | taa' |
v | ث | thaa' |
j | ج | jiim |
H | ح | Haa' |
x | خ | khaa' |
d | د | daal |
V | ذ | dhaal |
r | ر | raa' |
z | ز | zaay |
s | س | siin |
J | ش | shiin |
S | ص | Saad |
D | ض | Daad |
T | ط | Taa' |
Z | ظ | Zaa' (DHaa') |
C | ع | cayn |
G | غ | ghayn |
f | ف | faa' |
q | ق | qaaf |
g | ڤ | gaaf |
k | ك | kaaf |
l | ل | laam |
m | م | miim |
n | ن | nuun |
h | ه | haa' |
w | و | waaw |
E | ى | 'alif maqSuura |
y | ي | yaa' |
_ | ّ | shaddah |