api.lexicala.com

Documentation

The Lexicala Web API is a RESTful API that provides quality lexical data of K Dictionaries originating from lexicographic resources for 50 languages, including monolingual cores as well as numerous bilingual pairs and multilingual combinations. All endpoints return data in JSON format, except for two endpoints that return data in JSON-LD (RDF) format.

GETTING STARTED

To utilize the API, registration is required through our RapidAPI page. Upon registration, you will receive an X-RapidAPI-Key, which is required to access the API endpoint. 


The base URL for the API is https://lexicala1.p.rapidapi.com. You can find code snippets and test various requests on our RapidAPI page.


LANGUAGES AND DATA

  • Information about the languages available through the API can be obtained with GET /languages, including the full names corresponding to language codes, and the languages available in the various resources* (in order to use the links below, use your X-RapidAPI-Key).


By default, the results are from K Dictionaries’ Global series. In addition, there are data from the Password series, the Password-associated MultiGloss series, and Random House Webster’s College Dictionary.


 

  • Global includes 25 monolingual cores (see list below), to many of which are added translation equivalents, producing diverse bilingual pairs and multilingual combinations.
  • Password consists of an English learner’s dictionary core, translated into 46 languages.
  • MultiGloss is based on Password bilingual glossaries to English, which are automatically expanded to 44 more languages.
  • Random House Webster’s College Dictionary is a legacy comprehensive dictionary of the English language.

More information about the different resources is available on the Lexicala website.


 

The following is a list of all the source languages (monolingual cores), and the target languages available for each one.

Source languageTarget languages in GlobalTarget languages in Password* Target languages in MultiGloss
Arabic (ar)deEnglish + 44 languages
Catalan (ca)English + 44 languages
Chinese Simplified (zh)en, fr, jaEnglish + 44 languages
Chinese Traditional (tw)
Croatian (hr)English + 44 languages
Czech (cs)English + 44 languages
Danish (da)de, en, es, fr, koEnglish + 44 languages
Dutch (nl)de, en, es, frEnglish + 44 languages
English (en)br, da, es, fr, ja, no, svaf, ar, az, bg, br, ca, cs, da, de, el, es,
et, fa, fi, fr, fy, he, hi, hr, hu, id, is,
it, ja, ko, lt, lv, ml, nl, no, pl, prs, ps,
pt, ro, ru, sk, sl, sr, sv, th, tr, tw, uk,
ur, vi, zh
Estonian (et)English + 44 languages
Finnish (fi)English + 44 languages
French (fr)ar, br, da, de, el, en, es, he, it, ja, nl,
no, pl, pt, ru, sv, tr
English + 44 languages
Frisian (fy)English + 44 languages
German (de)ar, br, da, en, ja, nl, no, sv, trEnglish + 44 languages
Greek (el)frEnglish + 44 languages
Hebrew (he)en, fr, koEnglish + 44 languages
Hindi (hi)
Hungarian (hu)English + 44 languages
Indonesian (id)English + 44 languages
Italian (it)br, en, ja, noEnglish + 44 languages
Japanese (ja)de, en, es, fr, zhEnglish + 44 languages
Korean (ko)jaEnglish + 44 languages
Latin (la)fr
Latvian (lv)English + 44 languages
Malay (ml)English + 44 languages
Norwegian (no)de, en, es, fr, it, ko, plEnglish + 44 languages
Polish (pl)en, fr, noEnglish + 44 languages
Portuguese Brazil (br)de, en, es, fr, itEnglish + 44 languages
Portuguese Portugal (pt)frEnglish + 44 languages
Russian (ru)fr, jaEnglish + 44 languages
Slovenian (sl)English + 44 languages
Spanish (es)br, da, en, ja, nl, no, svEnglish + 44 languages
Swedish (sv)de, en, es, frEnglish + 44 languages
Thai (th)English + 44 languages
Turkish (tr)de, frEnglish + 44 languages
Ukrainian (uk)English + 44 languages
* MultiGloss 44 Languages: af, ar, az, bg, br, ca, da, de, el, es, et, fa, fi, fr, fy, he, hi, hr, hu, id, is, it, ja, ko, lt, lv, ms, nl, no, pl, pt, ro, ru, sk, sl, sr, sv, th, tr, tw, uk, ur, vi, zh

See below how to specify which resource to look in, when querying the API for a specific language. 


GET /search

Search for entries with GET /search. A basic API search result consists of a JSON object containing partial lexical information on entries that match the search criteria.


To obtain more in-depth information for each entry, see GET /entries below.


The entries are returned as objects within the results array, and contain the following fields:

  • the unique entry ID
  • the source language code
  • the headword text
  • the part of speech
  • the different senses with their unique sense ID and definition

Basic search parameters include:

  •  source (= Global, Password, MultiGloss, Random House) –Specify which resource to look in.
    The default value is Global (the Global series).
  • language (= ar, br, de, en, es, ja …) – Specify which source language to look in.
  • text – Specify a headword.

For example: Go to Lexicala API at Rapid *

This query returns all the entries in the Spanish resource of the Global series with the headword “azul”.


It is possible to look for headwords with specific syntactic criteria:

  • pos (= noun, verb …) – Specify part of speech.
  • number (= singular, plural …) – Specify grammatical number.
  • gender (= masculine, feminine …) – Specify grammatical gender.
  • subcategorization (= masculine, feminine …) – Specify subcategorization.
  • monosemous (boolean) – Find single sense entries only.
  • polysemous (boolean) – Find multiple sense entries only.

The API also includes two functionalities pertaining to inflected forms and word stems:

  • morph (boolean) – Searches for the text in both headwords and inflections, including in our supplemental morphological lists. This is based on existing human-curated data and semi-automatically generated morphological lists.
  • analyzed (boolean) – A stemmer algorithm that strips words to their stem, and disregards diacritics and case (uppercase/lowercase).

The Morph Parameter

Setting morph = true looks for all the inflected forms (as well as headwords) contained both in the dictionary data and in the external morphological lists. *


Searching “houses” will return the entry “house” (noun), even though the word “houses” is not an entry in the English resource (it’s the plural form of “house”).


The Analyzed Parameter

Setting analyzed = true looks for inflected forms by applying the stemmer .*


This query returns the entries “working” (adj.), “work” (verb), “work” (noun), “hard-working” (adjective), “working class” (noun), “work on” (verb), and any other entry with the stem “work” in its headword.


The stemmer also disregards diacritics and vocalization (e.g. in Arabic and Hebrew) and removes case-sensitivity (uppercase/lowercase).


Antonym and Synonym Search

  • antonyms (boolean) – Search text as an antonym.
  •  synonyms (boolean) – Search text as a synonym.

It’s possible to search in the headwords either antonyms or synonyms, or both simultaneously.


GET /search-entries

Identical to /search but returns full entries rather than abridged versions.


GET /entries, GET /senses – searching by entry (or sense) ID

When searching by parameters (as explained above), each entry result contains a unique entry ID, and each sense of the entry has its own unique sense ID.


Using these IDs, it’s possible to obtain more data –syntactic and semantic information, multiword expressions, usage examples, translations, etc. – of a single entry (or sense). The entry collection groups together all entries from the different resources (Global, Password, MultiGloss, Random House).


The result JSON object contains the field id, source, language, version, related entries, headword, and senses, as follows:

  • id (string) – the unique dictionary entry ID
  • source (string) – the K Dictionaries resource from which the entry is taken (Global, Password, MultiGloss, Random House)
  • language (string) – a two-character string that is the language code (for a list of all language codes, query GET/languages)
  • version (number) – the version of the dictionary the entry is taken from
  • related entries (array of strings) – an array containing the IDs of the related entries
  • headword (object/ array of objects) – contains extensive syntactic and phonetic information on the headword
  • senses (array of objects) – contains an elaborate disambiguation of the headword into senses, including syntactic, phonetic and semantic information

Some examples:

This query returns the complete entry “bank” * in the English core of the Global series.

This query returns the complete entry “comunemente” * in the Italian core of the MultiGloss series.
This query returns the complete entry “chair” * in the Password series.
This query returns the complete entry “smile” * in Random House Webster’s College Dictionary.
You can also search for a specific sense * by its unique sense ID. This query returns the second sense of the polysemous entry “bank”.
The JSON result for this type of query includes: id (sense id), source, language and entry (entry id).


GET /search-rdf

Similar to search-entries, but accepting less parameters and returning results in RDF (JSON-LD) form. The analyzed and morph parameters exist for this call. Only the Global series data are available for this call.


Example: This query returns all the entries that have “dog”* as their headword. The results are in RDF form.


 /rdf

Identical to /entries, but returns results in RDF (JSON-LD) form. Only the Global series data are available for this call.


 Example: This query returns the complete entry “great”* in the English resource of the Global series.


GET /search-definitions

Performs a free-text search in definitions, enabling contextually relevant results. Supported languages are: ar, br, cs, da, de, el, en, es, fr, he, hi, it, ja, ko, nl, no, pl, pt, ru, sv, th, tr.


Parameters

  • text: The text to search for in definitions.
  • lang (optional): Filters results to match entries in the specified language. The search text itself can be in any language.

Examples

  • Searching for “green fruit” can return results like “apple” (one of its definitions is “a round green or red fruit”) and “avocado” (one of its definitions is “an oval dark green fruit with a large stone”).
  •  Searching for “thing to sleep on” can return results like “pillow” (one of its definitions is “a soft object that you put your head on when you sleep”) and “bed” (one of its definitions is “a piece of furniture for sleeping”).

The results are returned as objects within the results array, and contain the following fields:

  • the unique entry ID
  • the source language code
  • the headword text
  • the part of speech
  • the unique sense ID
  • the sense definition

STRUCTURE

Following is a detailed schema of the different elements constituting a complete entry JSON object, divided by type. Note that some elements can be of more than one type.


Headword Object
Strings: text, pos, subcategorization, gender, case, register, number, geographical_usage, mood, tense
Numbers: homograph_number
Arrays: tense, mood, geographical_usage, register, case, subcategorization (arrays of strings), inflections (array of objects)
Objects: alternative_scripts, pronunciation


Sense Object (within the Senses array)
Strings: id, definition, semantic _category, register, range_of_application, subcategorization, geographical_usage, semantic_subcategory, sentiment, see, see_also
Arrays: semantic_category, register, sentiment, geographical_usage, range_of_application, subcategorization, synonyms, antonyms, semantic_subcategory, see_also (arrays of strings), examples, compositional_phrases, inflections, senses (array of objects)
Objects: translations


MultiGloss senses have different fields:
Strings: id, en_headword, en_pos
Arrays: en_examples, translations. 


Compositional Phrases Object (within the Compositional Phrases array)
Strings: text, definition, sentiment, register, semantic_category, semantic_subcategory, range_of_application, aspect, pos, geographical_usage
Arrays: synonyms, antonyms, senses, sentiment, register, semantic_category, semantic_subcategory, range_of_application, geographical_usage (arrays of strings), examples (array of objects)
Objects: alternative_scripts, translations


Examples Object (within the Examples array)
Strings: text
Objects: alternative_scripts, translations


Translations Object

field = language code (2 letters) – value is an object (or an array of objects for more than one translation) with the following fields:

Strings: text, range_of_application, collocate, register, semantic_category, semantic_subcategory, sentiment, gender, number, geographical_usage, pos
Arrays: range_of_application, collocate, register, semantic_category, semantic_subcategory, sentiment, geographical_usage (arrays of strings), inflections, pronunciation (array of objects)
Objects: alternative_scripts, pronunciation


Inflections Object (within the Inflections array)
Strings: text, geographical_usage, case, number, gender, register, tense, aspect, subcategorization, mood
Arrays: geographical_usage, case, register, tense, subcategorization, mood (arrays of strings), pronunciation (array of objects)
Objects: alternative_scripts, pronunciation


Pronunciation Object
fields: value (string) – the pronunciation text, geographical_usage (string/array of strings)


Alternative Scripts Object
field: the name of the alternative script with a string value containing the text 

RDF STRUCTURE

The API returns the Global series data in RDF format for users who need structured, semantic representation of lexicographic entries. RDF (Resource Description Framework) allows the integration of this data into linked data systems, making it suitable for applications that require a more formal representation of language resources.


This structure is a schematic overview of the RDF elements used in lexicographic entries. For a more detailed explanation, please refer to the documentation at OntoLex-Lexicog


lxicographicEntryIn Objecte

Strings: @id, @type (always lexicog:LexicographicResource), language (the language of the lexicographic resource)


describes Object

Strings: @id, @type (always ontolex:LexicalEntry)
Arrays: form, senses, translations
Objects: entryIn, pos


Form Object
Strings: @id, @type (always ontolex:Form), gender, number
Objects: text (dictionary with language code keys), pronunciation (dictionary with language and phonetic script keys)


Sense Object (within the senses array)
Strings: @id, @type (always ontolex:LexicalSense)
Arrays: examples, compositionalPhrases, translations, reversedRelates
Objects: lexicalizedSense, ofLexicographicComponent, homograph_entry, SenseToEntry


Example Object (within the examples array)

Strings: @id, @type (always lexicog:UsageExample)
Objects: value (dictionary with language keys for multilingual examples)


Translation Object (within the translations array)

Strings: @id, @type (always vartrans:Translation)
Objects: target, source, tranSet


Compositional Phrases Object (within compositionalPhrases)

Strings: @id, @type (always ontolex:LexicalSense)
Arrays: examples, translations
Objects: SenseToEntry, lexicalizedSense


SenseToEntry Object

String: @id, @type (always ontolex:LexicalEntry)
Objects: form, entryIn


entryIn Object

Strings: @id, @type (always lime:Lexicon), limeLanguage


nestedIn Object

Strings: @id, @type (always lexicog:LexicographicResource)
This field is used when an entry is related to another entry. It indicates that the current entry is nested within a broader context or group of entries. For example, the verb abandon might be related to the adjective abandoned. This structure helps in organizing related entries that share semantic or linguistic connections.


ofLexicographicComponent Object

Strings: @id, @type (always lexicog:LexicographicComponent)
Objects: subComponent
This field is used to represent senses that belong specifically to compositional phrases. It allows for the detailed description of a phrase’s meaning.


subComponent Object

Strings: @id, @type (always lexicog:LexicographicComponent)
Objects: describes


homograph_entry Object

Strings: @id, @type (always ontolex:LexicalEntry)
Objects: entryIn


tranSet Object

Strings: @id, @type (always vartrans:translationSet)


lexicalizedSense Object

Strings: @id, @type (always ontolex:LexicalConcept)
Objects: definition, source


reversedRelate Object

Strings: @id, @type (always vartrans:SenseRelation), category (can be lexinfo:synonym or lexinfo:antonym)

WORD FREQUENCY

Most entries in the dataset include a frequency attribute, which indicates the frequency of occurrence in some corpus. These frequency values are derived from SketchEngine.

** Below is a list of the corpora utilized by SketchEngine to calculate frequency values for our dataset 

** List of the Corpora:


ar corpus name: Arabic Web 2018 (arTenTen18).

  • Corpus info: number of tokens 5,341,978,851, number of words 4,637,956,234 

br corpus name: Brazilian Portuguese corpus (Corpus Brasileiro).

  • Corpus info: number of tokens 1,133,416,757, number of words 871,117,178

cs corpus name: Czech Web 2017 (csTenTen17).

  • Corpus info: number of tokens 12,586,415,546, number of words 10,502,222,474

da corpus name: Danish Web 2020 (daTenTen20).

  • Corpus info: number of tokens 4,127,362,161, number of words 3,480,275,804 

de corpus name: German Web 2020 (deTenTen20).

  • Corpus info: number of tokens 20,999,598,683, number of words 17,512,733,172

el corpus name: Greek Web 2019 (elTenTen19).

  • Corpus info: number of tokens 2,782,299,354, number of words 2,342,091,029 

en corpus name: English Web 2020 (enTenTen20).

  • Corpus info: number of tokens 43,125,207,462, number of words 36,561,273,153

es corpus name: Spanish Web 2018 (esTenTen18).

  • Corpus info: number of tokens 19,593,089,777, number of words 16,953,735,742 

fr corpus name: French Web 2020 (frTenTen20).

  • Corpus info: number of tokens 17,805,103,451, number of words 15,115,914,647 

he corpus name: Hebrew Web 2021 (heTenTen21).

  • Corpus info: number of tokens 3,183,067,122, number of words 2,775,686,699

hi corpus name: Hindi Web 2017 (hiTenTen17).

  • Corpus info: number of tokens 1,375,847,600, number of words 1,228,379,747

it corpus name: Italian Web 2020 (itTenTen20).

  • Corpus info: number of tokens 14,514,566,714, number of words 12,451,734,885

ja corpus name: Japanese Web 2011 (jaTenTen11).

  • Corpus info: number of tokens 10,321,875,664, number of words 8,432,294,787

ko corpus name: Korean Web 2018 (koTenTen18).

  • Corpus info: number of tokens 2,054,520,141, number of words 1,668,851,720 

nl corpus name: Dutch Web 2020 (nlTenTen20).

  • Corpus info: number of tokens 6,836,979,371, number of words 5,890,009,964

no corpus name: Norwegian Web 2017 (noTenTen17, Bokmål).

  • Corpus info: number of tokens 2,787,260,248, number of words 2,461,704,417

pl corpus name: Polish Web 2012 (plTenTen12, RFTagger).

  • Corpus info: number of tokens 9,387,142,186, number of words 7,715,835,214

pt corpus name: Portuguese Web 2018 (ptTenTen18). 

  • Corpus info: number of tokens 8,731,838,327, number of words 7,407,393,731

ru corpus name: Russian Web 2011 (ruTenTen11).

  • Corpus info: number of tokens 18,280,486,876, number of words 14,553,856,113

sv corpus name: Swedish Web 2014 (svTenTen14).

  • Corpus info: number of tokens 3,889,895,434, number of words 3,401,035,817 

th corpus name: Thai Web 2018 (thTenTen18).

  • Corpus info: number of tokens 695,928,167, number of words 640,530,227 

tr corpus name: Turkish Web 2012 (trTenTen12).

  • Corpus info: number of tokens 4,124,133,118, number of words 3,388,418,900

tw corpus name: Chinese Web 2017 (zhTenTen17).

  • Traditional corpus info: number of tokens 2,977,351,219, number of words 2,400,405,372 

zh corpus name: Chinese Web 2017 (zhTenTen17).

  • Simplified corpus info: number of tokens 16,593,146,196, number of words 13,531,331,169

* in order to use the links below, use your X-RapidAPI-Key