English as a hybrid Romance-Germanic language

This original article by Dan Polansky and Yuwash investigates the hypothesis that English is a hybrid Romance-Germanic language rather than Germanic language, as it is often classified. The hypothesis is not necessarily part of scientific mainstream; many linguists would classify a language more on the basis of grammatical properties than the mixed origin of its core vocabulary.

English appears to be a hybrid Romance-Germanic language based on the mixed origin of its vocabulary. The degree to which English vocabulary is permeated with words stemming from Latin is remarkable. To determine the proportion of words that are of Romance origin (Latin or Latin via French), one needs to look at something like top 5,000, 10,000 or 80,000 words; if, by contrast, one includes the large swaths of the bottom-ontology scientific vocabulary, Latin and Greek are expected to outnumber everything else as origin of the words, but that is to be expected for many European languages and is not interesting or distinguishing English from them.

Anecdotally, when I (Dan Polansky) see or hear Italian, it reminds me of English; when I see or hear Danish, it reminds me of German.

Case for English being hybrid based on vocabulary
In sections below, various sources show that, in the core English vocabulary, words of Romance origin (from Latin, French, etc.) dominate words of Germanic origin. Let us emphasize that this concerns the core vocabulary. In Simons 2017, French origin and Latin origin combined reach 40% of vocabulary for about 1000 most common English words, reaching 50% of vocabulary for about 2000 most common English words, and rising slowly higher as the number of most common English words analyzed increases. (In Finkenstaedt and Wolff 1973, there are more than twice as many words from Latin and French than of Germanic origin, but since the basis for this analysis are 80,000 words and we aim to look at core vocabulary, this is a much weaker argument). To disregard this lexical dominance and classify modern English merely based on its grammatical features appears debatable.

Simons 2017
The graph in Simons 2017, in article section Visualizing the data, suggests that French origin and Latin origin combined reach 40% of vocabulary for about 1000 most common English words, reaching 50% of vocabulary for about 2000 most common English words, and rising slowly higher as the number of most common English words analyzed increases. Simons 2017 indicates wordfrequency.info as its source for word frequencies, where the website indicates that "The data is based on the one billion word Corpus of Contemporary American English (COCA) -- the only corpus of English that is large, up-to-date, and balanced between many genres."

Williams 1975
w:Joseph M. Williams has conducted a survey over 10 000 words based on data “compiled from several thousands of business letters” (which originates from Roberts 1965 ). The breakdown is as follows:

Simons 2017 has some reservations about the methodology.

Issues by Dan Polansky with the above section (the section in the current form was authored mostly by Yuwash):
 * What does "English" mean? Does it mean "Middle English? Or "Old English"? Or does it refer to an ancestor of "Old English"?

Finkenstaedt and Wolff 1973
A compurized survey conducted by Finkenstädt and Wolff over the “Shorter Oxford Dictionary (3rd edition)” containing around 80 000 words has yielded the following distribution :


 * French, including Old French and early Anglo-French (28.3%)
 * Latin, including modern scientific and technical Latin (28.24%)
 * Germanic languages (Old/Middle English, Old Norse, Dutch) (25%)
 * Greek (5.32%)
 * No etymology given (4.03%)
 * Derived from proper names (3.28%)
 * Other (less than 1 % each) (5.83%)

Piechart based on AskOxford
The following pie chart is relevant. The chart description at File:Origins of English PieChart.svg refers to http://www.askoxford.com/asktheexperts/faq/aboutenglish/proportion?view=uk, which is available in Wayback Machine ; the AskOxford page refers the data to Thomas Finkenstaedt and Dieter Wolff (1973) and indicates the data to be "the result of a computerized survey of roughly 80,000 words in the old Shorter Oxford Dictionary (3rd edition)". The chart contains minor mismatches, e.g. where AskOxford states "28.24%" for Latin, the chart rounds it up to "29%", which is arguably unconventional rounding; similar puzzling rounding up is there for other categories. Compared to the AskOxford data, the category "No etymology given" with 4.03% is missing in the chart; it seems the chart lets these 4.03% dissolve in other categories.

The discussed pie chart based on AskOxford:

The chart is mentioned in Simons 2017 via an old version of the English Wikipedia article Latin influence in English (section moved to w:Foreign-language influences in English).

There is a similar chart based on Finkenstaedt and Wolff (1973)/AskOxford, with no labels for the percentages:

Above, the category "Unknown/Other" is large enough to possibly match the union of two categories from AskOxford.

Langfocus 2016
The pie chart mentioned in "Piechart based on AskOxford" section is described in a Langfocus 2016 video. It lacks attribution but it’s obviously the same image.

Langfocus 2016 also relates the creole hypothesis, by which English is a creole language. The theory highlights huge simplification in English grammar that took place, including considerable reduction of inflection. Old English had an inflection system not unlike many other inflected languages, Langfocus 2016 tells us.

Poorly identified data from Wikipedia
Wikipedia article "Foreign-language influences in English" features the following data: The Wikipedia article indicates the source to be Williams 1975. However, the above data bear no clear relation to the actual data published in section Williams 1975; not even the categories match.
 * French (langue d'oïl): 41%;
 * "Native" English (derived from Old English): 33%;
 * Latin: 15%;
 * Old Norse: 5%;
 * Dutch: 1%; and
 * Other: 5%.*

The above data first appeared in Wikipedia in https://en.wikipedia.org/w/index.php?title=Latin_influence_in_English&oldid=393376561, 20 October 2010, by an anonymous IP editor, tracing the data to https://www.amazon.com/dp/0029344700, which is a 1986 edition of Williams 1975, with no page number.

Further reading:
 * Foreign-language influences in English

Dawkins and Pinker 2009
In a conversation with Stephen Pinker, Richard Dawkins relates how he was at a linguistic conference and mentioned to linguists he thought English was a hybrid between Germanic and Romance languages, and met with disagreement from the linguists. While that does not prove much, it shows that a very intelligent and well educated native English speaker can find the hypothesis worth considering.

Romance adjectives
One witness to the hybrid character is that when one wants to create an adjective proper for a noun (rather than an attributive use of the noun), one switches to a Romance word. For example:
 * tree --> arboreal
 * sky --> celestial
 * earth --> terrestrial
 * star --> stellar
 * water --> aquatic
 * life --> biotic

Further reading:
 * , wikipedia.org
 * , wikipedia.org