Research in programming Wikidata/Countries

The chapter is devoted to the study of countries based on the knowledge base of the Wikidata international project. SPARQL queries were used in order to analyse and compare "countries" objects in Wikidata. A list of all currently existing countries, a list of countries ordered by date of creation, a list of demonyms of countries were generated. A bubble chart with the forms of government of countries, a graph of neighboring countries and a map of neighboring countries of Russia were constructed. In addition, conclusions were drawn regarding the completeness of the Wikidata for this topic.

Note: "Country" is too ambiguous word, so it's better to replace it everywhere with a class sovereign state.

List of countries

 * Items: country (Q6256)
 * Properties: instance of (P31)

Let's build a list of all countries in English and Russian.

SPARQL query. The result contains 205 countries in 2017 and 175 in 2020.

According to the degree of occupancy of properties on Wikidann, one can distinguish between "full" and "empty" countries.

Examples of the most complete and developed countries on Wikidata according to ProWD are: Israel, France, United States of America. According to ProWD, the leaders among the countries in terms of the number of properties in Wikidata are Israel and France (127 properties each), the lowest number of properties is in the Democratic Republic of Vietnam (24 properties).

Age of countries
Let's build a list of countries sorted by the date of the country's foundation (the first mention of the country).

Given:
 * Items: country (Q6256)
 * Properties: inception (P571)

SPARQL query. The result contains 112 countries with completed date of foundation in 2017 and 187 in 2021.

As a result of executing the request, a modest list of countries was obtained, including only 184 countries for 2020. Using the example of Russia, we will figure out what is the matter here. The Russia (Q159) object in the "instance of" field contains not one, but eight values, including country (Q6256).

On the Wikidata page "Request a query", some editors ask questions about how to write a particular script, while other editors answer. Use this forum.

The solution and the answer to this question were found on the page "Wikidata: Request a query", namely in the section available at the link https://w.wiki/tLm.

The point is that the wdt construction allows you to find only true values. For Russia, the preferred value in the "instance of" field is a sovereign state, not a country. To check all the options presented in the "instance of" field in Russia, you need to use the p:/ps: construction.

Thus, the script for getting all 232 countries sorted by creation date is shown in the next listing.

SPARQL query. The result contains 112 countries with completed date of foundation in 2017 and 235 in 2021.

To remove from this list no longer existing countries, that is, instances of the historical country(Q3024240) object, use the MIN US operator.

Using the script, 211 non-historical countries with a known foundation date were obtained.

SPARQL query. The result contains 112 countries with completed date of foundation in 2017 and 211 in 2021.

For example, France — 463 year, Russia — 862, Republic of Kosovo — 2008, South Sudan — 2011. The largest number of countries appeared in 1960 (16 countries), in 1991 (15 countries), in 1962 (6 countries) and in 1821 (6 countries).

Let's display a list of countries with an empty "inception of" property.

SPARQL query. The result contains 100 countries without completed date of foundation in 2017 and 7 in 2020.

Completeness of Wikidata
Let's analyze the completeness of Wikidata: historical and modern countries.

According to the "Russian classification of countries of the world" there are 251 countries on earth.

This task does not take into account ancient, non-existent states (for example: Assyria(Q41137), since they are not a "country" object but a "historical country" object. Let us note that the number of historical countries is an order of magnitude greater than the existing countries.

Using the script, let's build a list of historical states. There were three thousand such former states, which is an order of magnitude more than the number of modern states.

SPARQL query. The result contains 3025 countries without completed date of foundation in 2021.

According to the category of "Alphabetical list of countries and territories" in Russian Wikipedia, there are 252 countries.

According to the category of "List of sovereign states" in English Wikipedia, there are 206 countries.

It is not always possible to specify the exact date of the country's foundation for various reasons: absence, lack or inconsistency of written sources. For example, the basis of the Old Russian state is associated with the vocation of Varangian prince Rurik in 862, but there is no exact date (object Russia (Q159)). Also, some modern countries were preceded by a number of others and the date of formation of which of them should be considered as the date of creation of the country is an open question (for example, Mongolia(Q711).

List of demonyms in English
Demonyms — is the name of the inhabitants of a certain area, correlated with the toponym. For example, demonyms for Russia will be Russians, a Russian, a Russian woman, for the Czech Republic — Czechs.

In addition to the geographical factor, the new lexemes used to determine origin or belonging also come from ethnic, political, religious characteristics of people.

Demonyms can be defined by the names of different objects of the earth's surface, mountains, islands, continents. Also, the designation of the place of origin of people may depend on the political and administrative division. For example, to denote citizenship: Thailand — Thai people, Canada — Canadians. Intra-state division can also give rise to new names, Crimea — Crimeans.

Let's build a list of countries that have demonyms in English.

Given:
 * Items: country (Q6256)
 * Properties: demonym (P1549)

SPARQL query. The result contains 197 countries with demonyms in 2017 and 209 in 2021.

List of demonyms
Let's build a list of all demonyms in English.

SPARQL query. The result contains 237 demonyms in 2017 and 296 in 2021.

Countries with unfilled demonyms
Let's build a list of countries which do not have demonyms in English.

SPARQL query. The result contains 5 countries without demonyms in 2017 and 9 in 2021.

Thanks to the MINUS construction, the final list did not include countries with ethno-burial names in Russian.

Number of demonyms in countries
One country can have from zero, if the data is not filled in, to three or four ethnohoronyms. For example, Turkey has three names of its inhabitants: Turks, Tarchanka, Turks, Ethiopia has four: Ethiopian, Ethiopian, Ethiopian, Ethiopian.

Let`s display the list of countries, ordered by the number of demonyms filled in Wikidata.

SPARQL query. The result contains 199 count of demonyms in countries in 2017 and 215 in 2021.

According to data for 2017, the United States of America has the largest number of demonyms (41 demonyms), followed by Great Britain (40), Germany (40) and Canada (36). For 2021, the largest number of demonyms is in Germany (64 demonyms), Russia (61), Canada (60) and the United States (60). Thus, from 2017 to 2021, approximately 20 demonyms were added per country.

Вasic forms of government
Let's construct a bubble diagram of countries' government forms, where the size of the bubble will correspond to the number of countries with one form of government or another.

Given:
 * Items: country (Q6256)
 * Properties: subject's government (P122)

SPARQL query. The result contains 30 basic forms of government in 2017 and 41 in 2020.

The variable "bfog" (short for "basic form of government") contains the form of government, for example, "republic".

The last line in query contains the ordering commands first in descending order (DESC) and then ascending order (ASC). Thus, the forms of government are first sorted by the number of countries (?countries). Then, if the countries are equally divided, then the forms of government are sorted lexicographically.

As a result of the query, we get a bubble chart with the most popular forms of government in countries in 2017 and in 2020.



Thus, for the period from 2017 to 2020, the form of government "republic" became more "popular". The number of countries having the form of a "mixed republic" has significantly decreased. Forms such as democratic centralism, democratic republic, democracy, Islamic state and parliamentary democracy emerged.

Neighboring countries
Countries have such a property as a common border. On Wikidata, this property is shares border with (P47). Using this property, let's build a graph of neighboring countries.

Given:
 * Items: country (Q6256)
 * Properties: shares border with (P47)

SPARQL query. The result contains 795 neighboring countries in 2017 and 912 in 2020.

As a result of the query, we get a graph with 787 edges on 2017 and 912 edges on 2020, where the edge is a neighborhood between the two countries. The graph represents several connected components, since there are island countries that do not have neighbors (for example, Mauritius, Maldives, Madagascar).



Neighboring countries of Russia
We will construct a graph of neighboring countries of Russia.

SPARQL-запрос. The result contains \num{17} neighboring countries in 2021.

The line in query with the comment "is a country" is needed to check that the object specified as "having a common border"' with Russia is a country. This made it possible to exclude from the list the region of Georgia (Racha-lechkhumi and Kvemo-Svaneti), and for example, the island of Japan (Hokkaido), indicated in the list of border objects.

As a result of the query execution, we get a map of neighboring countries of Russia, including 17 countries, namely: Japan,  Norway,  USA,  Finland,  Sweden,  Poland,  Lithuania,  People's Republic of China,  Belarus,  Estonia,  Latvia,  Ukraine,  Azerbaijan,  Georgia,  Kazakhstan,  DPRK and  Mongolia.



Future work

 * 1) Build a list of country flags and mottos. Not all countries have mottos.
 * 2) Mark the capitals of modern countries on the map.
 * 3) In each part of the world, calculate the top five countries with the highest population density.
 * 4) Build a bar graph showing the distribution of the number of countries by government. Evaluate whether this distribution is a "heavy tail".
 * 5) Print the list of countries sorted by the number of neighbors. Which countries have the most and least neighbors, what is the average number of neighbors? Is there a correlation between this indicator and any other country dimension?

Tasks
{Which of the two hundred existing countries today emerged in the most productive years by the number of formed countries? ---+ 16 стран: Russia, Moldova, Belarus, Ukraine, Estonia, Slovenia, Republic of Macedonia, Croatia, Azerbaijan, Georgia, Kazakhstan, Uzbekistan, Armenia, Kyrgyzstan, Tajikistan +--- 6 стран: Greece, Peru, Guatemala, Honduras, Costa Rica, Nicaraguа -+-- 5 стран: Latvia, Lithuania, Poland, Estonia, Georgia --+- 4 страны: Bangladesh, Bahrain, Qatar, Sri Lanka
 * type=""}
 * 1821, | 1918, | 1971, | 1991

{Latvia has 119, Thailand 77, Denmark 5, and Russia 81. What we are talking about? - Is a number of cities with a population of more than one million? - Is a number of higher education institutions? + Is a number of Administrative Units? - Is a number of official languages?
 * type=""}

{Area: Israel 20770 square kilometers, population 8463400 people, area Mongolia 1566000 square kilometers, population 2953190 people, area Republic of Korea 100295 Square kilometers, population 50219669 people, and the area of Singapore 719.1 square kilometers, the population of 5781728 people. Arrange the flags of these Asian countries in order of increasing population density. --+- ---+ -+-- +---
 * type=""}
 * 1 place,|2 place,|3 place,|4 place

{Which of these languages are official in Russia? + Abaza + Moksha + Erzya - Belarusian
 * type="[]"}

SPARQL queries with answers:
 * most productive years
 * what for Latvia 119


 * population density


 * official languages in Russia

Links

 * Andrew Krizhanovsky, Smykova Elizaveta. WD: Analysis of the three aspects of modern countries on the Wikis: the age of countries, popular forms of government and demonyms // Authorea

Программирование Викиданных/Страны