The Lexicala Web API is a RESTful API that provides lexical data of K Dictionaries originating from lexicographical resources covering 50 languages, and including monolingual cores as well as numerous bilingual pairs and multilingual combinations. The API returns data as JSON documents.
The API endpoint is located at https://lexicala1.p.rapidapi.com.
You can test that the API is upwith GET /test:
To access the API, you need to authenticate your user.
Lexicala API uses Basic Authentication – your credentials are the username and password you used for registration.
To obtain access, pass those as a header, or as parameters in an HTTP client.
ACCOUNT AND ACCESS
You can view your user account settings with GET /users/me:
This includes the personal details such as your name and the email you have provided upon registration, your request cap, and the number of requests used in the last 24 hours.
Access and Caps
The API results are paginated – each result JSON contains the fields n_results, page_number, results_per_page, n_pages and available_n_pages, with corresponding number values.
The maximum number of results per search query (n_results) is limited to 10,000, with up to 30 entries per page.
The following parameters are used for navigating between pages, and modifying the number of results per page:
- page (number) – specify the page number out of available_n_pages, in order to navigate between pages.
the default value is 1.
- page-length (number) – specify how many results appear per page.
the default value is 10, and the maximum value is 30.
- sample (number) – specify the number of randomly-sampled results to return.
LANGUAGES AND DATA
Information about languages available through the API can be obtained with GET /languages, including the full names corresponding to language codes, and the languages available in the various resources:
By default, results are from KD’s Global series. Data from the Password Series and from Random House Webster’s College Dictionary are also available. The Global series includes 24 monolingual cores (see list below), which are added translation equivalents, producing multilingual versions. The Password series consists of an English core, translated to 46 languages. The Random House Webster’s College Dictionary is an extensive monolingual English dictionary. You can find more information about the different resources on our website.
The following is a list of all available source languages (monolingual core), and the available target languages for each resource.
|Source language||Target languages in Global||Target languages in Password|
|simplified Chinese (zh)||en, fr, ja|
|traditional Chinese (tw)|
|Danish (dk)||de, en, es, fr, ko|
|Dutch (nl)||de, en, es, fr|
|English (en)||br, dk, es, fr, ja, no, sv||af, ar, az, bg, br, ca, cs, de, dk, el, es, et, fa, fi, fr, fy, he, hi, hr, hu, is, it, ja, ko, lt, lv, ml ,nl, no, pl, prs, ps, pt, ro, ru, sk, sl, sr, sv, th, tr, tw, uk, ur, vi, zh|
|French (fr)||ar, br, de, dk, el, en, es, he, it, ja, nl, no, pl, pt, ru, sv, tr|
|German (de)||ar, br, dk, en, ja, nl, no, sv, tr|
|Italian (it)||br, en, ja, no|
|Japanese (ja)||de, en, es, fr, zh|
|Norwegian (no)||de, en, es, fr, it, pl|
|Polish (pl)||en, fr, no|
|Brazilian Portuguese (br)||de, en, es, fr, it|
|Russian (ru)||fr, ja|
|Spanish (es)||br, dk, en, ja, nl, no, sv|
|Swedish (sv)||de, en, es, fr|
|Turkish (tr)||de, fr|
See below how to specify which resource to look in, when querying the API for a specific language.
Search for entries with GET /search. A basic API search result consists of a JSON object containing partial lexical information on entries that match the search criteria. To obtain further, more in-depth information for each entry, see GET /entries below.
The entries are returned as objects within the results array, and contain the following fields:
- the unique entry ID
- the source language code
- the headword text
- part of speech
- the different senses with their unique sense ID and definition
Basic search parameters include:
- source ( = global, password, random) – specify which resource to look in.
the default value is global (the Global series).
- language ( = en, fr, es, de, pl, …) – specify which source language to look in.
- text – specify a headword.
This query returns all entries in the Spanish core of the Global series with the headword “azul”.
It is possible to look for headwords with specific syntactic criteria:
- pos ( = noun, verb, …) – specify part of speech.
- number ( = singular, plural, …) – specify grammatical number.
- gender ( = masculine, feminine, …) – specify grammatical gender.
- subcategorization ( = masculine, feminine, …) – specify subcategorization.
- monosemous (boolean) – find single sense entries only.
polysemous (boolean) – find multiple sense entries only.
This query returns all entries in the Polish dictionary of the Global series that are plural nouns.
The API also includes two functionalities pertaining to inflected forms and word stems:
- morph (boolean) – searches for the text in both headwords and inflections, including in our supplemental morphological lists. This is based on existing human-curated data and semi-automatic morphological lists.
- analyzed (boolean) – a stemmer algorithm that strips words to their stem, and disregards diacritics and case (uppercase/lowercase).
The morph parameter
setting morph = true looks for all inflected forms (as well as headwords) contained both in KD data and in the external morphological lists.
searching “houses” will return the entry “house” (noun) even though the word “houses” is not an entry in the English dictionary (it is a plural inflection of “house”).
The analyzed parameter
setting analyzed = true looks for inflected forms by applying the stemmer.
This query returns the entries “working” (adj.), “work” (verb), “work” (noun), “hard-working” (adjective), “working class” (noun), “work on” (verb) and any other entry with the stem “work” in its headword.
The stemmer also disregards diacritics and vocalization (for example in Hebrew and Arabic) and removes case-sensitivity (uppercase/lowercase).
Identical to /search but returns full entries rather than abridged versions.
GET /entries, GET /senses – searching by entry (or sense) ID
When searching by parameters (as shown previously), each entry result contains a unique entry ID, and each sense of an entry has its own unique sense ID. Using these IDs, it is possible to obtain more data – various syntactic and semantic information, compositional phrases, usage examples, translations and more – of a single entry (or sense). The entries collection groups together all entries from all different resources (Global, Password, Random House).
The result JSON object contains the field id, source, language, version, related_entries, headword and senses. Following is a brief explanation regarding each field.
- id (string) – the unique dictionary entry ID
- source (string) – the K Dictionaries resource from which the entry is taken (Global, Password, Random House)
- language (string) – a two-character string that is the language code (for a list of all language codes, query GET/languages)
- version (number) – the version of the dictionary the entry is taken from
- related entries (array of strings) – an array containing the IDs of the related entries
- headword (object/array of objects) – contains extensive syntactic and phonetic information of the headword
- senses (array of objects) – contains an elaborate disambiguation of the headword into senses, including syntactic, phonetic and semantic information
The query above returns the complete entry “bank” in the Spanish core of the Global series.
The query above returns the complete entry “chair” in the Password series.
The query above returns the complete entry “smile” in the Random House Webster’s college dictionary.
you can also search for a specific sense by its unique sense ID:
The query above returns the second sense of the polysemous entry “bank”. The JSON result for this type of query includes: id (sense id), source, language and entry (entry id).
Following is a detailed schema of the different elements constituting a complete entry JSON object, divided by type. Note that some elements can be of more than one type.
Strings: text, pos, subcategorization, gender, case, register, number, geographical_usage, mood, tense
Arrays: tense, mood, geographical_usage, register, case, subcategorization (arrays of strings), inflections (array of objects)
Objects: alternative_scripts, pronunciation
Sense Object (within the Senses array)
Strings: id, definition, semantic _category, register, range_of_application, subcategorization, geographical_usage, semantic_subcategory, sentiment, see, see_also
Arrays: semantic_category, register, sentiment, geographical_usage, range_of_application, subcategorization, synonyms, antonyms, semantic_subcategory, see_also (arrays of strings), examples, compositional_phrases, inflections, senses (array of objects)
Compositional Phrases Object (within the Compositional Phrases array)
Strings: text, definition, sentiment, register, semantic_category, semantic_subcategory, range_of_application, aspect, pos, geographical_usage
Arrays: synonyms, antonyms, senses, sentiment, register, semantic_category, semantic_subcategory, range_of_application, geographical_usage (arrays of strings), examples (array of objects)
Objects: alternative_scripts, translations
Examples Object (within the Examples array)
Objects: alternative_scripts, translations
field = language code (2 letters) – value is an object (or an array of objects for more than one translation) with the following fields:
Strings: text, range_of_application, collocate, register, semantic_category, semantic_subcategory, sentiment, gender, number, geographical_usage, pos
Arrays: range_of_application, collocate, register, semantic_category, semantic_subcategory, sentiment, geographical_usage (arrays of strings), inflections, pronunciation (array of objects)
Objects: alternative_scripts, pronunciation
Inflections Object (within the Inflections array)
Strings: text, geographical_usage, case, number, gender, register, tense, aspect, subcategorization, mood
Arrays: geographical_usage, case, register, tense, subcategorization, mood (arrays of strings), pronunciation (array of objects)
Objects: alternative_scripts, pronunciation
fields: value (string) – the pronunciation text, geographical_usage (string/array of strings)
Alternative Scripts Object
field: the name of the alternative script with a string value containing the text
ar corpus name: Arabic Web 2018 (arTenTen18), corpus info https://www.sketchengine.eu/artenten-arabic-corpus, number of tokens: 5,341,978,851, number of words 4,637,956,234
br corpus name: Brazilian Portuguese corpus (Corpus Brasileiro), corpus info https://www.sketchengine.co.uk/corpus-brasileiro, number of tokens: 1,133,416,757, number of words 871,117,178
cs corpus name: Czech Web 2017 (csTenTen17), corpus info http://www.sketchengine.co.uk/cstenten-czech-corpus, number of tokens: 12,586,415,546, number of words 10,502,222,474
de corpus name: German Web 2020 (deTenTen20), corpus info http://www.sketchengine.co.uk/detenten-german-corpus, number of tokens: 20,999,598,683, number of words 17,512,733,172
dk corpus name: Danish Web 2020 (daTenTen20), corpus info http://www.sketchengine.co.uk/datenten-danish-corpus, number of tokens: 4,127,362,161, number of words 3,480,275,804
el corpus name: Greek Web 2019 (elTenTen19), corpus info http://www.sketchengine.co.uk/eltenten-greek-corpus, number of tokens: 2,782,299,354, number of words 2,342,091,029
en corpus name: English Web 2020 (enTenTen20), corpus info https://www.sketchengine.eu/ententen-english-corpus, number of tokens: 43,125,207,462, number of words 36,561,273,153
es corpus name: Spanish Web 2018 (esTenTen18), corpus info http://www.sketchengine.co.uk/estenten-spanish-corpus, number of tokens: 19,593,089,777, number of words 16,953,735,742
fr corpus name: French Web 2020 (frTenTen20), corpus info http://www.sketchengine.co.uk/frtenten-french-corpus, number of tokens: 17,805,103,451, number of words 15,115,914,647
he corpus name: Hebrew Web 2021 (heTenTen21), corpus info https://www.sketchengine.co.uk/hetenten-hebrew-corpus, number of tokens: 3,183,067,122, number of words 2,775,686,699
hi corpus name: Hindi Web 2017 (hiTenTen17), corpus info https://www.sketchengine.co.uk/hitenten-hindi-corpus, number of tokens: 1,375,847,600, number of words 1,228,379,747
it corpus name: Italian Web 2020 (itTenTen20), corpus info https://www.sketchengine.eu/ittenten-italian-corpus, number of tokens: 14,514,566,714, number of words 12,451,734,885
ja corpus name: Japanese Web 2011 (jaTenTen11), corpus info http://www.sketchengine.co.uk/jptenten-japanese-corpus, number of tokens: 10,321,875,664, number of words 8,432,294,787
ko corpus name: Korean Web 2018 (koTenTen18), corpus info http://www.sketchengine.co.uk/kotenten-korean-corpus, number of tokens: 2,054,520,141, number of words 1,668,851,720
nl corpus name: Dutch Web 2020 (nlTenTen20), corpus info https://www.sketchengine.eu/nltenten-dutch-corpus, number of tokens: 6,836,979,371, number of words 5,890,009,964
no corpus name: Norwegian Web 2017 (noTenTen17, Bokmål), corpus info https://www.sketchengine.eu/notenten-norwegian-corpus, number of tokens: 2,787,260,248, number of words 2,461,704,417
pl corpus name: Polish Web 2012 (plTenTen12, RFTagger), corpus info http://www.sketchengine.co.uk/pltenten-polish-corpus, number of tokens: 9,387,142,186, number of words 7,715,835,214
pt corpus name: Portuguese Web 2018 (ptTenTen18), corpus info http://www.sketchengine.eu/pttenten-portuguese-corpus, number of tokens: 8,731,838,327, number of words 7,407,393,731
ru corpus name: Russian Web 2011 (ruTenTen11), corpus info http://www.sketchengine.co.uk/rutenten-russian-corpus, number of tokens: 18,280,486,876, number of words 14,553,856,113
sv corpus name: Swedish Web 2014 (svTenTen14), corpus info http://www.sketchengine.co.uk/svtenten-swedish-corpus, number of tokens: 3,889,895,434, number of words 3,401,035,817
th corpus name: Thai Web 2018 (thTenTen18), corpus info http://www.sketchengine.co.uk/thtenten-thai-corpus, number of tokens: 695,928,167, number of words 640,530,227
tr corpus name: Turkish Web 2012 (trTenTen12), corpus info https://www.sketchengine.co.uk/trtenten-turkish-corpus, number of tokens: 4,124,133,118, number of words 3,388,418,900
tw corpus name: Chinese Web 2017 (zhTenTen17), Traditional corpus info http://www.sketchengine.co.uk/zhtenten-chinese-corpus, number of tokens: 2,977,351,219, number of words 2,400,405,372
zh corpus name: Chinese Web 2017 (zhTenTen17), Simplified corpus info http://www.sketchengine.co.uk/zhtenten-chinese-corpus, number of tokens: 16,593,146,196, number of words 13,531,331,169