Compound (linguistics)

This original article is about compounds in linguistics. Its aim is to provide properly sourced information about compounds, in part to support their treatment in dictionaries. The requirement for inline references on single sentence level is taken more seriously than in most Wikipedia articles.

Definition
Compounds are easy to define approximately as words made from words. They are hard to define exactly, especially since it is difficult to distinguish compounds from phrases.

The following definitions of "compound" can be found:
 * (1) A word composed of multiple words.
 * (2) A word composed of multiple independent words or combining forms of words.
 * (3) A word composed of multiple free morphemes.
 * (4) A lexeme composed of multiple stems.
 * (5) A noun, adjective or verb composed of multiple words or parts of words.
 * (6) A sequence of multiple words that act as a single word..
 * (7) A word or word sequence consisting of multiple parts that captures a specific concept, whether the parts are words or affixes.

The definitions have different implications:
 * The definitions clearly requiring compounds to be words are (1), (2) and (3).
 * Definitions (2) and (4) are a technical refinement of (1). "lexeme" is a fancy synonym of "word". The use of "stem" or "combining form" is required e.g. for Czech, where slunovrat is based on slunce and vrátit, composed of slun- and vrat-, which are word stems, not words.
 * Definition (5) seems broken: "noun, adjective or verb" has to cover typographically multiple words to cover English open compounds, but if it does, then it covers phrases like "cat sitting on the mat", not a compound.
 * Definition (6) is unclear: it is not clear what it means for something to "act as a single word". By saying "act", it allows compounds to be not single words.
 * Definition (7) requires compounds to capture a specific concept, which seems to suggests compounds are not sum of parts. This cannot be so: many German compounds are sum of parts. Furthermore, it includes affixing under compounding; this makes sense for inflected languages: vysokoškolský is a compound but requires suffix -ský to be formed. A requirement for a compound proper, in contrast to lesní, could be that at least two of its parts are words. The case of vysokoškolský shows that definitions (1), (2), (3), (4), (5) and (6) are cross-linguistically inadequate: they work for English since hyphenated adjectival compounds take no suffix.

Demarcation
Compounds need to be distinguished from the following:
 * Affixed words, e.g. blueness. There may be ambiguity: is German "aufholen" made from prefix "auf-" or word "auf"? Is English overcome made from prefix "over-" or word "over"? Furthermore, a Merriam-Webster compound guide includes affixing under compounding. Another source indicates the distinction between compounding and affixing has been treated as problematic in literature.
 * Free non-compound phrases, e.g. green house (house that is green) or cat that is on the mat. The phrase school bus traffic stop laws looks to some as a compound, but credentialed sources usually do not give such an example.
 * Full-sentence proverbs, e.g. all roads lead to Rome.
 * Phrasal verbs. Some non-credentialed sources give phrasal verbs "carry over" and "break up" as example compounds. However, credentialed sources usually do not give such examples. On the other hand, when English phrasal verbs are considered to be single words, they meet the definitions of compounds. Still, sources usually do not define English phrasal verbs as words but rather as phrases.

Part of speech
A compound's part of speech can be noun, adjective and verb. Examples are "bus stop", "self-centered" and "windsurf".

Detection criteria or tests
Compounds written with spaces present a special problem for detection.

Cambridge Grammar of English Language (CGEL) mentions stress, orthography, meaning, and productivity as playing a role in distinguishing compounds from non-compound phrases. CGEL calls the non-compound noun phrases "composite nominals". A further test is "coordination and modification": parts of non-compounds can "enter separately into relations of coordination and modification".

Abdel Rahman Mitib Altakhaineh lists orthography, stress, modification, compositionality, displacement, insertion, referentiality, coordination, replacement of the second element by a pro-form, ellipsis, and inflection and linking element as tests.

Livio Gaeta & Davide Ricca consider compounds to be morphological objects, independent of their lexical status.

Wordhood
Since multiple sources define compounds as words, being a word is a criterion. However, a distinction needs to be made between "orthographic word", "phonological word" and other notions of word. Compounds are not necessarily orthographic words, as per the compound "high school". The term "lexical item" is broader than most notions of word, as containing proverbs. Another notion is "morphological word". While wordhood is usually a requirement, it is not a simple test but rather depends on a multitude of simpler tests.

Cross-linguistic uniformity or universality of notion
It may be difficult to arrive at a single universal cross-linguistic set of operational tests of compoundhood: "The first, very simple observation is that all languages examined here have morphological compounds. However, it turned out that the compounds in these languages do not all share the same defining properties. While lexical (compound) stress, headedness (either right or left), inseparability and debarment of word-internal inflection, recursiveness, and linking elements are generally considered essential criteria for the definition of compound, in particular from a German(ic) perspective, all of them also emerged as problematic in at least one language, or as non-existent. Thus, it seems that there is no universal definition of compound. Rather, as pointed out by Ralli (2013b: 184): 'What makes a compound morphological should be defined on a language-specific basis, since languages vary with respect to the realization of their morphological features and the use of morphologically-proper units.'"

Unity among different linguists within the same language
Even within a single language, different treatments of compounds can be found in literature, resulting in different classification of candidate compounds as true compounds or not.

French is a language for which some linguists count the likes of pomme de terre as compounds.<ref name= Some linguists go so far as to claim French has no true compounding at all.<ref name=

The Italian linguistic tradition is divided over constructions such as zuppa di verdure.

Spanish multi-word phrases león marino and paquete bomba were regarded as compounds by some but not others.

Spelling or orthography
Words written solid or hyphenated are easier to recognize as compounds. Word sequences written with spaces present a problem: not each such sequence is a compound. For instance, "cat that is on the mat" is not a compound, whereas "high school" is a compound. Britannica's article on compounding gives no example of an open compound, implying it does not consider open compounds to be compounds.

Spelling tests work well for some languages:
 * For German, all compounds are written without spaces, and writing them with spaces is a rare error.
 * In Czech and Slovak, all compounds are spelled as one word, while syntactic phrases are spelled as separate words.
 * Finnish: "As a general rule, Finnish compounds are written without space between the constituents"
 * Greek: "Greek compounds display solid spelling, contrary to phrases, just as in German."
 * In Polish, most compounds are spelled as one word without a hyphen, but there are exceptions such as Bośnia-Hercegowina and czarno-biały.

Composition: morphology vs. syntax
The name "compound" implies a composite object. However, both words and multi-word expressions are composite objects, the former made from morphemes (which include some words), the latter made from words. Two different kinds of composition are distinguished: morphological composition vs. syntactic composition.

To some extent, the distinction is unproblematic: "blueness" is a result of morphological composition while "the cat that is on the mat" is a result of syntactic composition. It is in the case of candidate open compounds such as "white house" vs. "White House" where the boundary becomes unclear in English.

"Compounds are the output of morphology, while MWEs [multi-word expressions] are the output of syntax. [...] The property of being morphological implies that an item is the output of some morphological schema or rule, which is different from a syntactic schema or rule."

"in contrast to German it seems much more difficult to provide clear criteria for morphological compounds as opposed to MWEs in French, Spanish, and Italian."

Phonology
English open compounds have a distinctive phonology. Britannica distinguishes compounds from word groups or phrases by "stress, juncture, or vowel quality or by a combination of these". However, while a great majority of English compounds written as single words stress the left component of the compound, a small minority of them stresses the right-hand component instead. There is also a number of double-stressed compounds. . Thus, for English, stress alone is not a universal criterion.

In Romance languages, "compounds and MWEs are basically stressed in the same way".

Meaning and sum of parts
Some sources indicate compounds are not sum of parts: their meaning cannot be derived from the meaning of their parts. A Czech encyclopedia says compounds usually have a meaning different from the base words. However, being more than a sum of parts is not a necessary condition: German compounds Tanzschule, Zirkusschule and many more are counterexamples, as are English compounds bookshop and appletree. Moreover, it is not a sufficient condition either: idiomatic proverbs are not compounds.

Separate inflection
Consisting of separately inflected parts is one test of non-compoundhood for highly inflected languages. It works only for some of them:
 * German
 * Danish, Swedish

The test has no value for English and Chinese.

The test fails for some languages:
 * In Spanish, some items considered compounds show separate inflection of parts.
 * In Icelandic, there is compound-internal inflection.

Linking element
Presence of a linking element may indicate compoundhood in some languages. Thus, in German, Liebesbrief contains s. However, this is no necessary condition in German, per Konzertreise.

"(Native) linking elements, [...], do not exist in French and Italian."

See also section Linking element examples.

Norms and prescriptions
Some sources for some languages prescribe compounds to be written without spaces:
 * Dutch: "Dutch orthography requires compounds to be written without an internal space."
 * German: "Die Wörter Kürbissuppe, Zwiebelkuchen und Hairstudio werden nach deutschen Wortbildungsregeln zusammengeschrieben."

Compound examples
Example compounds in various languages:
 * Ancient Greek: dermatology, democracy, pyromania, rhododendron, that is, δερματολογία, δημοκρατία, πυρομανία, ῥοδόδενδρον
 * Bulgarian: бензиностанция, бира-скара, пиле-грил, бензиностанция, кафе-аперитив, пиле-грил, бира-скара, фаст-фууд
 * Chinese: 大褂儿
 * Czech: zeměpis, olejomalba, vysokoškolský
 * Danish: fyrværkerigrund, bankrådgivning, kulturkløft
 * Dutch: jonggetrouwd, tandextractie, boerenzóon, koningszoon
 * English: rowboat, high school, devil-may-care, crime-prone, grass-green, sky-blue, air-quote, dry-burn
 * Estonian: lutipudel, riisipuder, noortööline
 * Finnish: lentokoneonnettomuus, kesäyö, märkäpuku, metsäyhtiö
 * French: timbre-poste, essuie-glace
 * German: Kürbissuppe, Zwiebelkuchen, Hairstudio, Handelsvertrag, Affenhaus , Frischluft
 * Greek: χαρτόκουτ, κεφαλόσκαλο, εθιμοτυπικός, κρυφοκοιτάζω
 * Hebrew: beyt sefer (בית ספר)
 * Hungarian: kisautó, kőkemény, városháza, tojásfehérje
 * Icelandic: gufubátur, Norðausturatlantshafsfiskveiðinefndin
 * Italian: pescecane, cavatappi, criminologo, transporto latte, poeta pittore
 * Latin: aequilibrium, multilateralis, carnivora
 * Polish: czerwono-czarni, listopad, językoznawstwo, czcigodny, zmartwychwstały, drobnoustrój
 * Russian: glubokomyslie, lesostep, zvukorežisser, senouborka
 * Sanskrit: rājapūruṣāḥ, rāmakṛṣṇau
 * Slovak: svetonázor
 * Slovene, Slovenian: ȃvtocesta, vodomèt, očenàš
 * Spanish: coliflor, coche cama, bocacalle, telaraña
 * Swedish: livbåt, livbåtsbesättning, flickebarn, människokärlek

Non-compound examples
The following items are non-compound phrases:
 * Danish: røget laks, stor begivenhed
 * Dutch: rode wijn, rijk versierd, koffie zetten
 * English: piece of cake, dry cough, grass slug, hit the road, green card (card that is green), heavy smoker, kick the bucket
 * German: weich wie Butter, schwarzer Tee, rotes Kraut, Spanisches Rohr, kalter Krieg
 * Greek: psixrós pólemos, zóni asfalías
 * Polish: kontrola jakośki, karma dla zwierzat, numer telefonu, pasta do zębów
 * Russian: novaja kniga, myľnaja opera, sredstva massovoj informacii
 * Swedish: röda hund, hög hatt, ymnig grönska, duka bordet

Linking element examples
Example compounds using a linking element:


 * Ancient Greek : ; δερματολογία
 * Bulgarian: ; бензиностанция
 * Czech: ; olejomalba
 * Danish: ; adgangskode
 * Dutch: ; adamsappel
 * English : ; marksman; in neo-classical compounds
 * Finnish:
 * German :, , , , ; Ankunftszeit; Sternennacht
 * Greek : ; τιμολόγιο
 * Latin:, ; abietifolius
 * Norwegian :, , arbeidsgruppe
 * Polish : ; listopad
 * Proto-Slavic: *listopadъ
 * Russian : ; белобровик
 * Serbo-Croatian : ; listopad
 * Slovak: ; svetonázor
 * Slovene, Slovenian: ; golobrad
 * Spanish : ; pelirrojo
 * Swedish :, , , ; bergssida

Long compounds
Some languages tend to form long compounds, consisting of 3 or more word bases. Some examples:
 * English: one source considers office management training seminar video to be a single compound. However, sources do not usually give this kind of example.
 * German: Aufmerksamkeitsdefizitsyndrom, Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz
 * Finnish: aasiantupsuhäntäpiikkisika
 * Hungarian: ezerkilencszázkilencvenkilenc, jövedelemegyenlőtlenség, kompromisszumképtelenség

Lists of compounds
A fairly extensive list of example English compounds is given in a non-native bachelor thesis written in English, sourced from English sources.

Very long lists of compounds are available in Wiktionary categories such as Category:German compound terms. However, these are unreliable and subject to miscategorization.

Long compounds can be found in syllable-count categories such as Category:Finnish 11-syllable words, Category:German 9-syllable words, Category:Polish 9-syllable words and Category:Russian 11-syllable words. Not all of the members need to be compounds.

Neo-classical compounds
Some sources classify the likes of historiography, chromatography and immunological as "neo-classical compounds". They are defined as "words consisting  of  two  or  more  free morphemes (of Latin  or Ancient  Greek) which are  bound,  not  free, in the modern language concerned, such as English biology."

Treatment in dictionaries
Candidate compounds and multi-word phrases are treated in dictionaries as follows:
 * Czech černokněžník is in dictionaries, while černá díra isn't.
 * Danish sort hul is in dictionaries.
 * English black hole is in dictionaries.
 * German: Schwarztee is in Duden while "schwarzer Tee" isn't . "schwarzes Loch" is not in Duden but is in DWDS as "Mehrwortausdruck", a multi-word expression..
 * Polish czarna dziura is in PWN.
 * Slovak černokňažník is in dictionaries, while čierna diera isn't.
 * Swedish svart hål is not in SAOB online and not in SAOL.

Machine translation
Translating closed compounds (those written solid, with no spaces or hyphens) is a relevant problem for machine translation from languages forming long compounds such as German. These languages form a huge number of transparent long closed compounds, for which it is impractical to maintain a translation dictionary. While breaking these compounds up into components is fairly easy for humans, it is non-trivial for machines. A sum-of-part translation consists in breaking the compound into components and translating the components separately. And example of ambiguity is German "verinbart", which is properly analyzed as a participe of "vereinbaren", but a machine could analyze it as Verein + Bart. (However, even the machine could note that vereinbart is not capitalized and that it is therefore not a noun. Still, the principle remains.)

Early analyses
Some of the earlier analyses of compound vs. phrase are Kruisinga 1932, Bloomfield 1933, Bloch and Trager 1942, Trager and Smith 1951, Marchand 1960, Lees 1960, Zimmer 1971, and Quirk et al. 1972.

Compound term
The phrase "compound term" can be found in reference to compounds in linguistics, but seems rare. One user of the term is Dimković-Telebaković, who includes "vertical take-off and landing aircraft" as an example, which would not be considered to be an English compound by many linguists.