Thesaurus (lexicography)

This article is about lexicographical works called "thesaurus" that serve as semantic or conceptual word-finders. There are approximately four classes of textual works carrying the word "thesaurus" in their title or name:
 * 1) Works of word lists organized like Roget's Thesaurus, that is, organized by the concept or idea expressed. Let us call them Roget-style thesauri.
 * 2) Alphabetically organized dictionaries of near synonyms or other terms similar in meaning, carrying the word "thesaurus" in their name. Let us call them dictionary-form thesauri.
 * 3) Non-alphabetically organized works relating words by their meaning, organized not on the model of Roget's Thesaurus.
 * 4) Thesauri in information retrieval, which are controlled vocabularies with explicitly marked relationships of broader term, narrower term and related term, serving vocabulary control and indexing.

This article focuses on Roget-style thesauri. It covers dictionary-form thesauri and non-alphabetical non-Roget-style thesauri to some extent. The fourth class is covered in article Thesaurus (information retrieval).

It is open to debate whether lexicographical thesauri by name, which include the classes mentioned but not information retrieval thesauri, form a genuine class or whether they are rather an artificial union of genuine classes. They all belong to the superclass of semantic or conceptual word-finders.

To support learning by having a look at exemplars, the following features links to specific thesauri and their example entries.

Disambiguation
Both Roget-style thesauri and dictionary-form thesauri (works of alphabetically organized word lists) find words by their meaning, serving a purpose for which a traditional alphabetically organized definition dictionary is poorly equipped. But they approach the purpose very differently. The work defining the Roget-style thesaurus category is Roget's Thesaurus by Peter Mark Roget, whose 1852 first edition is now in public domain, as is its 1911 edition. In Roget's Thesaurus, words are grouped into semantic categories or buckets, hierarchically organized in larger categories. In contrast to synonym dictionaries, Roget's Thesaurus aims to reveal broader meaning/semantic relationships; for instance, as part of Animal entry, we find "Mammal, quadruped, bird, reptile, fish, mollusk, worm, insect, zoophyte, animalcule, &c., menagery, fossil remains"; as part of Plain entry, we find "Meadow, mead, haugh, pasturage, park, field, lawn, terrace, esplanade, sward, turf, sod, heather; lea, grounds, pleasure grounds." In this narrow sense, thesaurus is near synonymous to Roget's Thesaurus in any of its expanded editions and translations, as late as of 2019 edition of Roget's International Thesaurus and 2019 edition of Roget's Thesaurus of English Words and Phrases. The value of separating Roget-style thesauri as a category of its own is recognized by Gary Provost in his book 100 Ways to Improve Your Writing: "You can find thesauruses in paperback and hardcover, and Roget's is not the only one. I do not recommend the ones that are arranged solely in dictionary form. They are easier to use but only about twelve percent as useful". To add to confusion, there is a work called "The New Roget's Thesaurus in Dictionary Form", which, despite having the name "Roget" in its title, is not a Roget-style thesaurus but rather A-Z thesaurus; the title part "in dictionary form" points to the difference. A similar confusion arises from the dictionary-form Roget's II: The New Thesaurus.

Some semantic or conceptual word-finders are neither Roget-style nor in dictionary form (organized alphabetically). They are described in Conceptual non-Roget-style thesauri section and include above all Historical Thesaurus by University of Glasgow and in Oxford English Dictionary, featuring a hierarchical conceptual organization.

Unlike lexicographical thesauri, thesauri for information retrieval aim to prescribe a preferred term for a concept and they link concepts by the explicitly marked relationships of broader term, narrower term and related term. Moreover, thesauri for information retrieval usually only cover nouns and noun phrases, unlike lexicographical thesauri, which also cover adjectives, verbs and possibly adverbs. Lexicographical thesauri are word-finders and writer's tools.

Getting an impression of Roget's thesauri
Some of the best ways of learning about Roget's thesauri is having a look. The following editions are currently publicly available online:
 * The 1911 T. Y. Crowell edition: Internet Archive (Wikidata)
 * The 1911 T. Y. Crowell edition modified by MICRA corporation (Wikidata)
 * In Project Gutenberg: eBook 22; eBook 10681
 * In the English Wiktionary: Wiktionary: Appendix:Roget MICRA thesaurus
 * The 1922 T. Y. Crowell edition: Bartleby.com.
 * The 1962 edition of Roget's International Thesaurus: Internet Archive

Characterization of Roget's Thesaurus
Roget's Thesaurus is characterized as follows:
 * Its smallest building block is a list of words and phrases of the same part of speech, related by meaning.
 * It has no definitions, etymologies, pronunciations and usage examples.
 * A larger building block is an entry, structured into subentries by part of speech, a subentry having multiple lists in general. An entry has a label indicating a category to which its lists belong.
 * There are on the order of 1000 entries.
 * There is a hierarchy of categories of which entries are leaf nodes, starting with 6 root classes.
 * An entry is organized into subentries by part of speech: noun, adjective, verb, adverb and phrase.
 * A subentry for a part of speech is further organized into multiple separated lists.
 * The items in a list are semantically related.
 * The semantic relation of items in the lists is not uniform; it includes synonymy, near-synonymy, hyponymy and other semantic similarity or being in the semantic vicinity.
 * The semantic relation is not explicitly marked; it does not say "synonyms:", "hyponyms:" or the like.
 * This lack of machine-like uniformity driven by detailed rules suggests the remarkable talent required to organize such a material. It also suggests a somewhat subjective character of the result.
 * In addition to the entries organized by the idea conveyed, there is a second, separate part that is an alphabetical index.
 * The alphabetical index states for each word or phrase in which entries it can be found.
 * When the entries are presented in electronic form with full-text search, the alphabetical index becomes dispensable.

Example Roget's thesaurus entry
The following is an example entry from 1911 T. Y. Crowell edition containing a minor expansion by MICRA corporation, using a modified typography.

505. Memory

Noun


 * memory, remembrance ● retention, retentiveness ● tenacity ● veteris vestigia flammae[Lat] ● tablets of the memory ● readiness.
 * reminiscence, recognition, recollection, rememoration[obs3] ● recurrence, flashback ● retrospect, retrospection.
 * afterthought, post script, PS.
 * suggestion &c. (information) 527 ● prompting &c. v. ● hint, reminder ● remembrancer[obs3], flapper ● memorial &c. (record) 551 ● commemoration &c. (celebration) 883.
 * [written reminder] note, memo, memorandum.
 * things to be remembered, token of remembrance, memento, souvenir, keepsake, relic, memorabilia.
 * art of memory, artificial memory ● memoria technica[Lat] ● mnemonics, mnemotechnics[obs3] ● phrenotypics[obs3] ● Mnemosyne.
 * prompt-book ● crib sheet, cheat sheet.
 * retentive memory, tenacious memory, photographic memory, green memory|!, trustworthy memory, capacious memory, faithful memory, correct memory, exact memory, ready memory, prompt memory, accurate recollection ● perfect memory, total recall.
 * celebrity, fame, renown, reputation &c. (repute) 873.

Verb


 * remember, mind ● retain the memory of, retain the remembrance of ● keep in view.
 * recognize, recollect, bethink oneself, recall, call up, retrace ● look back, trace back, trace backwards ● think back, look back upon ● review ● call upon, recall upon, bring to mind, bring to remembrance ● carry one's thoughts back ● rake up the past.
 * have in the thoughts, hold in the thoughts, bear in the thoughts, carry in the thoughts, keep in the thoughts, retain in the thoughts, have in the memory, hold in the memory, bear in the memory, carry in the memory, keep in the memory, retain in the memory, have in the mind, hold in the mind, bear in the mind, carry in the mind, keep in the mind, retain in the mind, hold in remembrance ● be in one's thoughts, live in one's thoughts, remain in one's thoughts, dwell in one's thoughts, haunt one's thoughts, impress one's thoughts, be in one's mind, live in one's mind, remain in one's mind, dwell in one's mind, haunt one's mind, impress one's mind, dwell in one's memory.
 * sink in the mind ● run in the head ● not be able to get out of one's head ● be deeply impressed with ● rankle &c. (revenge) 919.
 * recur to the mind ● flash on the mind, flash across the memory.
 * [cause to remember] remind ● suggest &c. (inform ) 527 ● prompt ● put in mind, keep in mind, bring to mind ● fan the embers ● call up, summon up, rip up ● renew ● infandum renovare dolorem [Lat] ● jog the memory, flap the memory, refresh the memory, rub up the memory, awaken the memory ● pull by the sleeve ● bring back to the memory, put in remembrance, memorialize.
 * task the memory, tax the memory.
 * get at one's fingers' ends, have at one's fingers', learn at one's fingers', know one's lesson, say one's lesson, repeat by heart, repeat by rote ● say one's lesson ● repeat, repeat as a parrot ● have at one's fingers' ends.
 * [transitive] commit to memory, memorize ● con over, con ● fix in the memory, rivet in the memory, imprint in the memory, impress in the memory, stamp in the memory, grave in the memory, engrave in the memory, store in the memory, treasure up in the memory, bottle up in the memory, embalm in the memory, enshrine in the memory ● load the memory with, store the memory with, stuff the memory with, burden the memory with.
 * redeem from oblivion ● keep the memory alive, keep the wound green, pour salt in the wound, reopen old wounds' ● tangere ulcus[obs3][Lat] ● keep up the memory of ● commemorate &c. (celebrate) 883.
 * make a note of, jot a note, pen a memorandum &c. (record) 551.

Adjective


 * remembering, remembered &c. v. ● mindful, reminiscential[obs3] ● retained in the memory &c. v. ● pent up in one's memory ● fresh ● green, green in remembrance ● unforgotten, present to the mind ● within one's memory &c. n. ● indelible ● uppermost in one's thoughts ● memorable &c. (important) 642.

Adverb


 * by heart, by rote ● without book, memoriter[obs3].
 * in memory of ● in memoriam ● memoria in aeterna[Lat] ● suggestive.

Phrase


 * manet alta mente repostum [Lat][Vergil] ● forsan et haec olim meminisse juvabit [Lat][Vergil] ● absens haeres non erit [Lat] ● beatae memoriae [Lat] ● "briefly thyself remember" [Lear] ● mendacem memorem esse oportet [Lat][Quintilian] ● "memory the warder of the brain" [Macbeth] ● parsque est meminisse doloris [Lat][Ovid] ● "to live in hearts we leave behind is not to die" [Campbell] ● vox audita peril littera scripta manet [Lat] ● out of sight, out of mind.

Links: the 1911 edition in Internet Archive, page 150, without MICRA expansion.

Lineages of Roget's thesauri
Roget's Thesaurus was first published in 1852. A series of editions followed, both in the U.K. and the U.S.

In the U.S., the 1911 edition published by T. Y. Crowell in New York is now in public domain. From there on, T. Y. Crowell followed by more editions, which became titled Roget's International Thesaurus. Later, T. Y. Crowell became part of HarperCollins that continued in that tradition, resulting in Roget's International Thesaurus, 2019, officially labeled as 8th edition by some numbering. The description of the 2019 edition suggests that "Roget's International Thesaurus" is a registered trademark. A 1962 edition is currently publicly available in Internet Archive.

Another line is a British one, including publications by Longman and later Penguin. Penguin produced editions under the names Roget's Thesaurus and Roget's Thesaurus of English Words and Phrases as late as of 2019.

Selected early London editions with links to Wikidata (from which there are links to other information about the edition):
 * Thesaurus of English Words and Phrases (1852), London
 * Thesaurus of English Words and Phrases (1853), London
 * Thesaurus of English Words and Phrases (1856), London

Selected American editions with links to Wikidata:
 * Roget's Thesaurus (1911), T. Y. Crowell Co., C. O. Sylvester Mawson
 * Roget’s International Thesaurus of English Words and Phrases (1922), T. Y. Crowell Co., C. O. Sylvester Mawson
 * Roget's International Thesaurus (1936), T. Y. Crowell Co.
 * Roget's International Thesaurus (1946), T. Y. Crowell Co.
 * Roget's International Thesaurus (1962), 3th ed., T. Y. Crowell Co., Lester V. Berrey
 * Roget's International Thesaurus (1977), 4th ed., T. Y. Crowell Co., Robert L. Chapman
 * Roget's International Thesaurus (1992), 5th ed., HarperCollins, Robert L. Chapman
 * Roget's International Thesaurus (2002), 6th ed., HarperCollins, Barbara Ann Kipfer
 * Roget's International Thesaurus (2011), 7th ed., HarperCollins, Barbara Ann Kipfer
 * Roget's International Thesaurus (2019), 8th ed., HarperCollins, Barbara Ann Kipfer

Selected Longmans and Penguin editions with links to Wikidata:
 * Roget's Thesaurus of English Words and Phrases (1962), Longmans, Robert A. Dutch
 * Roget's Thesaurus (1966), Penguin, Robert A. Dutch
 * Roget's Thesaurus of English Words and Phrases (1982), Susan M. Lloyd
 * Roget's Thesaurus (1999), Penguin, Betty Kirkpatrick
 * Roget's Thesaurus of English Words and Phrases (2003), Penguin, George Davidson
 * Roget's Thesaurus (2004), Penguin, George Davidson
 * Roget's Thesaurus of English Words and Phrases (2019), Penguin, seems to be a hardcover edition of the 2004 one

Non-English Roget-style thesauri
As per Klégr 2008, non-English variants of Roget's Thesaurus include German (Dornseiff 1954), Spanish (Casares 1959), Dutch (Brouwers 1965) and French (Péchoin 1995). Klégr himself produced The Thesaurus of the Czech Language (Tezaurus jazyka českého, 2007) using a particular edition of Roget's Thesaurus as a starting point.

Dictionary-form thesauri
As has been pointed out, the word "thesaurus" used in the title of Roget's Thesaurus has been borrowed by English-language publishers as a name for alphabetically organized lexicographical works of word lists, often inaccurately described as lists of "synonyms". Whatever the merits of doing so, perhaps to use a catchy short name and make a reference to the popular Roget's Thesaurus, it resulted in a confusion. To be useful, alphabetically organized thesauri often greatly relax the relationship that they show, to include not only near synonyms but also hyponyms and even more remotely semantically related terms. By not being organized into broader semantic or conceptual groups, these dictionaries feature a great deal of duplication, as is apparent e.g. in the public domain Moby Thesaurus II. That is, one particular ring of semantically related terms tends to get repeated in each of its members at least in part.

The Penguin Thesaurus (2004) is one example of a work of alphabetically organized lists of words having similar meaning, which the publication refers to as synonyms. However, what it covers are in fact often also hyponyms. Thus, in horse entry, we find mount, steed, stallion, gelding, mare, filly, colt, foal, pony, hack, cob, nag, jade, dobbin and gee-gee.

Dictionaries of synonyms
The manner in which dictionary-form thesauri use the word synonym to refer to non-synonyms (strictly understood) raises the question whether works called "synonym dictionary" or "dictionary of synonyms" do the same, covering synonyms, hyponyms and more distantly semantically related words under the "synonym" headword. In so far as one defines a synonym not as a word with the same meaning as another but merely a similar meaning, with unclear definition and criteria for "similar", the concept of a synonym gets rather blurred. The actual practice remains to be researched.

Example synonym dictionaries:
 * English Synonyms and Antonyms by James Champlin Fernald, 1896, gutenberg.org. Example entry: KNOWLEDGE. Synonyms: acquaintance,	apprehension, cognizance, cognition, comprehension, erudition, experience, information, intelligence, intuition, learning, light, lore, perception, recognition, scholarship, science, wisdom.
 * A Complete Dictionary of Synonyms and Antonyms by Samuel Fallows, 1898, gutenberg.org. Example entry: KEY: Alliance. SYN: Compact, treaty, cooperation, union, connection, partnership, league, combination, coalition, confederation, friendship, relation, relationship.

Conceptual non-Roget-style thesauri
One thesaurus not purely alphabetically organized is Macmillan thesaurus online. Rather than restricting the relationships to synonyms and antonyms, it groups words in what it calls topics. One example of a topic is "People who are considered dishonest or insincere" and it cover "liar" and "cheat" but also "hustler", "impostor" and "sycophant", not synonyms of each other. Moreover, a topic (a category) is linked to other topics via "Explore related topics" relation.

Other thesauri not organized alphabetically are University of Glasgow's Historical Thesaurus of English and Oxford English Dictionary Historical Thesaurus, which provide a hierarchical taxonomy of categories to which words with their senses are assigned; words are assigned both to inner nodes and to leaf nodes of the category tree. These two thesauri appear to be the same work or one based on the other.

One work that can be interpreted as a conceptual non-Roget-style thesaurus is WordNet, which is not named "thesaurus". It has expressly marked synonymy, hyponymy, hypernymy, holonymy and meronymy, and it has nouns, adjectives and verbs. It has brief definitions. It is not a work primarily meant to be a word-finder, but can be used as one. Its marking of synonymy is strict. Synonyms are listed in a key entity, which is the synset or synonym set. Because of the explicitly marked hyponymy relation, there is no need or value in listing hyponyms as synonyms. WordNet can be alternatively understood as a thesaurus for information retrieval, especially due to its explicitly marked non-synonym relations, but unlike such a typical thesaurus, it does not mark any synonym in the synset as the preferred one

Non-Roget-style thesauri online
The following are some of the works called thesaurus that are not Roget-style, available online:
 * Merriam-Webster Thesaurus. Example entry: word: 1. as in term: term, phrase, expression, idiom, monosyllable, morpheme, linguistic form, speech form, locution, colloquialism, euphemism, collocation, coinage, neologism, loanword, archaism, modernism, polysyllable, vernacularism. 2. as in statement: [...] About: "Our unique ranking system helps you find the right word fast—from millions of synonyms, similar words, and antonyms."
 * Cambridge Dictionary Thesaurus. Example entry: informative: "USEFUL": useful, helpful, valuable, invaluable, constructive, worthwhile, instructive, functional, utilitarian, handy, efficient, working. About: "Get clear explanations and examples of the differences between thousands of synonyms and antonyms, in both British and American English."
 * Collins Thesaurus. Example entry: information: synonyms: facts, details, material, news, latest, report, word, message, notice, advice, knowledge, data, intelligence, instruction, counsel, the score, [...]. About: no descriptive statement about the specifics on the landing page.
 * Macmillan Thesaurus. Example entry: word: "single unit of spoken or written language" The words of a language: domain, language, the lexicon, lexis, usage, vocabulary, vocab, [...] "someone’s words are things that they say" Types of word or phrase: acrostic, adjacency pair, Americanism, anagram [...]. About: "Find synonyms and antonyms as well as related terms. Search for synonyms or topics, and browse the full Thesaurus content to build your vocabulary."
 * Thesaurus.com. Example entry: word: conversation, talk, chat, chitchat, colloquoy, confab, confabulation, consultation, discussion, tête-à-tête [...]. About: no about statement found on the landing page.
 * OED Historical Thesaurus. Requires a sign in. Example entry found by looking for "word": the mind > language > speech > &#91;noun&#93; > that which is or can be spoken (69): speech, saw, speech, guide, words, word, thing, roun, mouth, queath, breath, reason, speakings, sware, saying, voice, lore, sermon, [...]. Lists definitions alongside the words. About: "The Historical Thesaurus is a taxonomic classification of the majority of senses and lemmas in OED Online. It can be thought of as a kind of semantic index to the contents of the OED."
 * University of Glasgow’s Historical Thesaurus of English. Example entry: Memory: mind, memory, imagination, memorial, recordation, remembrance, recollection, memory bank, [...]. About: "The Historical Thesaurus of English is the first historical thesaurus ever produced for any language, containing almost every word in English from Old English to the present day. [...] All these words and their dates of recorded use are displayed within a detailed semantic framework, offering a fascinating picture of the development of the vocabulary of English from its origins in early medieval times to the present." The website has guides describing the structure and design of the thesaurus.
 * Power Thesaurus.org. Example entry: word: term, talk, news, phrase, speak, promise, information, intelligence, formulate, message, oath, pledge [...]. About: no about statement found on the landing page.
 * WordHippo. Example entry: horse: steed, mare, stallion, mount, colt, filly, gelding, nag, pony, equine, yearling, bronco, brumby, carthorse, charger, cob, foal, moke, jade, packhorse, racehorse, dobbin, hack, yarraman, cuddy, hobby, mustang, plug, studhorse, draft horse, draught horse, gee-gee. It introduces each list of similar words with a definition to which it belongs.

As is apparent, none of the works examined restrict their word lists to true synonyms.

It is the dictionary-form non-Roger-style thesauri that lead the results of online searches for "thesaurus" as if they were the category-defining thesaurus exemplars rather than Roget's Thesaurus. This may be in part explained by the public availability of these thesauri, even if as copyrighted material, unlike the modern editions of Roget-style thesauri.