There are three ways to search the corpus: Exact, Lemma, and Regex.
Exact
This will search for the word exactly as you type it. For example: typing يمشي will return all instances of يمشي, but not تمشي or مشيت.
Lemma
This will search for inflected forms of the word. For example: typing مشي will return يمشي، مشيت, etc. and دار will return الدار، دارها, والدار, etc.
- Enter the basic form of the word, without any conjugations or inflections.
- Note that the software that performs the lemmatization is not very accurate. So you might want to try searching several different ways.
-
Also, the lemmatization does not take into account word form changes. For example:
- To find all instances of the verb مشى, you will need to search for both مشي and مشى (as well as spelling variants like مشا).
- To find all instances of كرهبة, you will need to search for كرهبة and كرهبت (to find words like كرهبتو).
RegEx
Searching with regular expressions allows you to perform sophisticated wildcard searches.
- To perform a regex search, it's best to enter your search in transliteration, rather than Arabic characters, following the transliteration system given below. While you can enter searches in Arabic script (like في.*), it gets difficult for more complex searches because the regex operators are LTR and the Arabic script is RTL.
- Do not use the word boundary character (
\b
) in your search — these are automatically added to the search string. -
A good primer on regex syntax can be found at the RegEx page on Wikipedia. Some examples:
[sS]Hb
will return صحب or سحبkrhb[pht](\w)*
will return كرهبة, كرهب, كرهبتي, etc.(ma)?\w{3,6}J
will return مشيتش, يحبوش, مافيباليش, and most other negative verbs and pseudo-verbs. (Currently there's no way to search for word phrases like ما عرفش.)
Transliteration System
Modified version of the Buckwalter system:
Character | Arabic Letter | Description |
---|---|---|
c | ء | hamza-on-the-line |
A | آ | madda |
e | أ | hamza-on-'alif |
W | ؤ | hamza-on-waaw |
I | إ | hamza-under-'alif |
i | ئ | hamza-on-yaa' |
a | ا | bare 'alif |
b | ب | baa' |
p | ة | taa' marbuuTa |
t | ت | taa' |
v | ث | thaa' |
j | ج | jiim |
H | ح | Haa' |
x | خ | khaa' |
d | د | daal |
V | ذ | dhaal |
r | ر | raa' |
z | ز | zaay |
s | س | siin |
J | ش | shiin |
S | ص | SaaD |
D | ض | DaaD |
T | ط | Taa' |
Z | ظ | Zaa' (DHaa') |
C | ع | cayn |
G | غ | ghayn |
f | ف | faa' |
q | ق | qaaf |
g | ڤ | gaaf |
k | ك | kaaf |
l | ل | laam |
m | م | miim |
n | ن | nuun |
h | ه | haa' |
w | و | waaw |
E | ى | 'alif maqSuura |
y | ي | yaa' |