PyDicManager - Multiple dictionary API

PyDicManager is a class that provides single and clean way to manage multiple dictionaries at the same time. Constructor method requires paths of all dictionaries that will be used, but after initialisation dictionaries will be referred by theirs names rather then paths. In fact, dictionary location (path) only matters on loading stage. Therefore, you can easily move your dictionaries to different place in filesystem as far as you only remember to point correct path when loading dictionary. Without changing an internal name of a dictionary (which is stored in file .pydoc, not in a name of a directory) no references of word identificators will be broken.

class pydic.PyDicManager(*args)

Manages single access to multiple PyDic instances

As arguments simple define dictionary paths you want to load.

Word identification namespace

Because PyDic itself provides an word identification namespace, it does not make a big problem to handle multiple dictionaries at once.

Format of identificator is: <pydic integer identificator>@<pydic name>

Eg. 132@first_dictionary

Warning

You should never load dictionaries with the same names, as dictionary name should be unique.

id method

PyDicManager.id(word)

Returns all known id for a word from every dictionary

Parameters:word (unicode) – a word form
Returns:list of str full id

Example:

>>>> dic.id(u'zamkowi')
[PyDicId('123643@gen12'), PyDicId('123644@gen12'), PyDicId('123802@gen12')]

>>>> dic.id(u'zamek')
[PyDicId('123643@gen12'), PyDicId('123644@gen12')]

Warning

Querying is case-insensitive.

id_forms method

PyDicManager.id_forms(pydic_id)

Returns forms vector for a full_id

Parameters:pydic_id (str) – word full id
Returns:list of unicode forms or empty list

Example:

>>> dic.id_forms('123643@gen12')
[u'zamek',
 u'zamka',
 u'zamkowi',
 u'zamek',
 u'zamkiem',
 u'zamku',
 u'zamku',
 u'zamki',
 u'zamk\xf3w',
 u'zamkom',
 u'zamki',
 u'zamkami',
 u'zamkach',
 u'zamki']

word_forms method

PyDicManager.word_forms(word)

For a word form returns a list of unique forms vector.

Parameters:word (unicode) – a word form
Returns:unique list of vector forms as tuple

Example:

>>> dic.word_forms(u'zamek')

[[u'zamek',
  u'zamka',
  u'zamkowi',
  u'zamek',
  u'zamkiem',
  u'zamku',
  u'zamku',
  u'zamki',
  u'zamk\xf3w',
  u'zamkom',
  u'zamki',
  u'zamkami',
  u'zamkach',
  u'zamki'],
 [u'zamek',
  u'zamek',
  u'zamku',
  u'zamkowi',
  u'zamek',
  u'zamkiem',
  u'zamku',
  u'zamku',
  u'zamki',
  u'zamk\xf3w',
  u'zamkom',
  u'zamki',
  u'zamkami',
  u'zamkach',
  u'zamki']]

Warning

Querying is case-insensitive.

Note

It is not possible to say which inflectional vector comes from which dictionary, as a returned list is flat. If you need this kind of information you will need make query by identificators. This method assumes that you want to be dictionary agnostic if querying by word forms, not by id.

Warning

As you can see there can be more than one inflectional vector that matches a given word. Therefore this function always return list of lists. PyDicManager will merge and will make unique all possible vectors from all possible dictionaries.

id_base method

PyDicManager.id_base(pydic_id)

Returns base form for a full id

Parameters:pydic_id (str) – word full id
Returns:word base form as unicode or None

Example:

>>> dic.id_base('123643@gen12')
u'zamek'

word_base method

PyDicManager.word_base(word)

Returns unique word base forms for a given word

Parameters:word (unicode) – a word form
Returns:unique list of forms as unicode

Example:

>>> dic.word_base(u'zamkowi')
[u'zamek', u'zamkowy']

Warning

Querying is case-insensitive.

Warning

As you can see there can be more than one inflectional vector that matches a given word. Therefore this function always return list of lists.

Note

Elements on that list are unique.