Stemmer¶
Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form.
For now stemming module of PyDic allow to index base words from a single dictionary and make an inflection of any word list, building in this way a new dictionaries from new, possibly unknown words.
Warning
Stemmer is currently under heavy development.
Syntax¶
pydic_stemmer.py --help
usage: pydic_stemmer.py [-h] [-d DELIMITER] -f DICTIONARY_FILE [-t OUTPUT]
[-b] [-v]
[FILE]
Makes inflection of a flat text file with words.
positional arguments:
FILE filename to process
optional arguments:
-h, --help show this help message and exit
-d DELIMITER, --delimiter DELIMITER
-f DICTIONARY_FILE, --dictionary-file DICTIONARY_FILE
path to file with text dictionary
-t OUTPUT, --output OUTPUT
output file name
-b, --base-forms only base forms
-v, --verbose debug verbose mode
Input data format¶
A list of base forms of unknown words, eg.
$ cat new.txt
supermegapojazd
oktokrążownik
Inflecting words¶
$ pydic_stemmer.py -f sjp.pydic new.txt
supermegapojazd,supermegapojazdach,supermegapojazdami,supermegapojazdem,supermegapojazdom,supermegapojazdowi,supermegapojazdów,supermegapojazdu,supermegapojazdy,supermegapojeździe
oktokrążownik,oktokrążownika,oktokrążownikach,oktokrążownikami,oktokrążowniki,oktokrążownikiem,oktokrążownikom,oktokrążownikowi,oktokrążowników,oktokrążowniku