Research in programming Wikidata/Cities

The article is devoted to the study of different types of cities corresponding to the four objects of Wikidata - "Town", "City", "Big city" and "City with millions of inhabitants". Using SPARQL queries to Wikidata, data on the number of instances of the objects under study was obtained and the following information was gathered:
 * Population of different types of cities
 * Number of cities without sister cities
 * List of cities ordered by number of sister cities
 * Number of cities with certain amount of sister cities
 * Country with most sister cities
 * Closest neighbours of Russia by number of sister cities

"Town"
SPARQL query, 13800 records (2020).
 * Wikidata element: Q3957

"City"
SPARQL query, 20800 records (2017), 9260 records (2020).
 * Wikidata element: Q515

Most complete elements include > San-Francisco, Berlin, Petrozavodsk, …

Almost empty elements are > Madinat Zayed, Muzaffarpur, Willow-River, …

According to ProWD Singapore is the leader in terms of the number of properties (104 properties) among cities around the world. Novorossiysk contains 31 properties. This is the maximum number of properties for Russian cities.

"Big city"
SPARQL query, 198 records (2017), 3075 records (2020).
 * Wikidata element: Q1549591

Most complete elements include > Bern, Berlin, Geneva, …

Almost empty elements are > Balanga (Nigeria), Ungaran, Kayes, …

According to ProWD Singapore is the leader in terms of the number of properties (104 properties) among big cities around the world. Moscow contains 76 properties. This is the maximum number of properties for Russian big cities.

"City with millions of inhabitants"
SPARQL query, 616 records (2020).
 * Wikidata element: Q1637706

Different types of cities
SPARQL query, 26751 records (2020).

"Town"
Used: SPARQL query, 53,30 million people (2020).
 * Object: town (Q3957)
 * Property: instance of (P31)
 * Property: population (P1082)

"City"
Used: SPARQL query, 1 133,56 million people (2020).
 * Object: city (Q515)
 * Property: instance of (P31)
 * Property: population (P1082)

"Big city"
Used:
 * Object: big city (Q1549591)
 * Property: instance of (P31)
 * Property: population (P1082)

SPARQL query, 2 538,49 million people (2020).

"City with millions of inhabitants"
Used:
 * Object: city with millions of inhabitants (Q1637706)
 * Property: instance of (P31)
 * Property: population (P1082)

SPARQL query, 2 118,39 million people (2020).

Analysis
Different characters, such as point, comma, or space, are used as separators in different countries. As a result, the variants of representing the value of the population property can also be different. Problems arise when using a point, because in Wikidata this character is the separator between the integer and decimal parts of a number. To disambiguate, REPLACE function to remove the specified character should be used. This conversion does not affect the value itself, since the population is an integer, and the separators are used solely for ease of reading.

The table below shows a summary of the population of different types of cities, as well as the proportion of the population per type of city of the world population, which reached approximately 7,8 billion people in 2020. According to Wikidata, almost three quarters of the world's population live in cities.

Sister cities
Sister cities are cities of different states that have established permanent friendly relations with each other in order to strengthen international relationship in the fields of culture, economics, creation and management of urban infrastructure, the functioning of civil society, and so on.

How many cities don't have a single sister city?
Used: SPARQL query, 21479 cities (2020).
 * Object: town (Q3957)
 * Object: city (Q515)
 * Object: big city (Q1549591)
 * Object: city with millions of inhabitants (Q1637706)
 * Property: instance of (P31)
 * Property: sister city (P190)

There are 26751 cities of four types known by Wikidata for 2020. Thus, sister cities are known only for 20% of cities.

All
Used: SPARQL query, 4046 cities with sister cities (2020).
 * Object: town (Q3957)
 * Object: city (Q515)
 * Object: big city (Q1549591)
 * Object: city with millions of inhabitants (Q1637706)
 * Property: instance of (P31)
 * Property: sister city (P190)

Russia
Used: SPARQL query, 82 cities with sister cities (2020).
 * Object: town (Q3957)
 * Object: city (Q515)
 * Object: big city (Q1549591)
 * Object: city with millions of inhabitants (Q1637706)
 * Object: Russia (Q159)
 * Property: instance of (P31)
 * Property: country (P17)
 * Property: sister city (P190)

There were more cities wishing to be friends with the cultural capital of Russia (Saint Petersburg, 230 sister cities) than with the official capital (Moscow, 134 sister cities) for 2020. Omsk (58), Volgograd (56) and Kaliningrad (54) had almost the same number of sister cities. Petrozavodsk, Perm, Vladimir and Belgorod each had 14 sister cities.

All
Used: SPARQL query, 90 variants of sister cities amount (2020). A little more than four thousand cities (4046 cities) have at least one sister city, of which:
 * Object: town (Q3957)
 * Object: city (Q515)
 * Object: big city (Q1549591)
 * Object: city with millions of inhabitants (Q1637706)
 * Property: instance of (P31)
 * Property: sister city (P190)


 * 32% (1314 cities) have relations with more than five cities;
 * 18% (728 cities) have at least 11 sister cities;
 * 9% (345 cities) friends with more than 20 cities;
 * 2% (94 cities) have 50 or more sister cities.

It can be concluded that the relation between number of sister cities the city have and number of cities which have this amount of sister cities has a distribution close to a power law.

Russia
Used: SPARQL query, 24 variants of sister cities amount (2020).
 * Object: town (Q3957)
 * Object: city (Q515)
 * Object: big city (Q1549591)
 * Object: city with millions of inhabitants (Q1637706)
 * Object: Russia (Q159)
 * Property: instance of (P31)
 * Property: country (P17)
 * Property: sister city (P190)



A little less than a hundred Russian cities (82 cities) have at least one sister city, of which only 48% (39 cities) are connected with over than five cities.

Which country has the most sister cities?
Used:
 * Object: town (Q3957)
 * Object: city (Q515)
 * Object: big city (Q1549591)
 * Object: city with millions of inhabitants (Q1637706)
 * Property: instance of (P31)
 * Property: country (P17)
 * Property: sister city (P190)

SPARQL query, 208 countries (2020).



Germany had the largest number of sister cities (1375 cities) for 2020.

List of countries having sister cities with Germany
Used:
 * Object: town (Q3957)
 * Object: city (Q515)
 * Object: big city (Q1549591)
 * Object: city with millions of inhabitants (Q1637706)
 * Object: Germany (Q183)
 * Property: instance of (P31)
 * Property: country (P17)
 * Property: sister city (P190)

SPARQL query, 93 countries (2020).

The table shows a list of ten countries that have the largest number of sister cities with Germany (2020).

Closest neighbours of Russia by number of sister cities
Used:
 * Object: town (Q3957)
 * Object: city (Q515)
 * Object: big city (Q1549591)
 * Object: city with millions of inhabitants (Q1637706)
 * Object: Russia (Q159)
 * Property: instance of (P31)
 * Property: country (P17)
 * Property: sister city (P190)
 * Property: geoshape (P3896)

SPARQL query, 102 countries (2020).



Russia has more than twenty sister cities with countries such as United States of America (46), China (46), Germany (44), Ukraine (28), Bulgaria (25), Poland (24), France (23) and Italy (22).

Wikidata completeness and disadvantages
City is a type of human settlement with people not occupied with agriculture. At the same time, different countries use different criteria when assigning city status to settlements, the main of which is population. Some countries don't define a term "city" at all. So, in France, only one geographic unit of this kind is used &mdash; a commune, regardless of the number of people living in it and the type of their activity. Therefore, it can be difficult to clearly determine which settlement is classified as a city and which is not.

In practice, some Wikidata objects can simultaneously be instances of different types of cities. For example, Shanghai is assigned to three objects under study: city, big city, city with millions of inhabitants. It is easy to guess that such multiple assignment affects the results of SPARQL queries, in particular, using the UNION construction. This can be verified by running, for example, SPARQL query for finding different types of cities. Shanghai is found in the results for three times.

Wikidata has an inheritance mechanism expressed in the subclass of property. This mechanism consists in the fact that if an object is an instance of big city, then it is also an instance of city, since big city is a subclass of city. Thus, the situation described above with Shanghai can be resolved by leaving only one class — city with millions of inhabitants. It should be noted that replacing a UNION construction with a subclassing construction is not equivalent.

SPARQL query

SPARQL query

Shanghai, considered earlier, can be found four times in the new query results. The fact is that in addition to some of the objects under study, there are other classes inherited from city. For example, lost city, free imperial city, autonomous city and even ideal city.

Also, probably due to the ambiguity in the criteria for assigning city status, subclasses were created for specific countries — city in Chile, city in Cyprus, city of Japan and so on. This tendency was not spared by the cities of Russia, which could be noticed when comparing the results of a SPARQL query to find instances of the "City" object. For 2020, most of them belong to the city/town class.

According to the Russian Census (2010) and the Crimean Federal District Census (2014), the total number of Russian cities was 1117 in 2014. All cities in Russia have an article in both Russian and English Wikipedia.

Number of Wikidata elements which are Russian cities equals to 1126. It can be assumed that Wikidata completely covers, at least, Russian cities.

Future work

 * 1) Construct a graph of Russian sister cities.
 * 2) Get list of Russian cities situated beyond the Arctic circle.
 * 3) On which river in Russia is the largest number of cities located?
 * 4) Which country has the largest proportion of sister cities within a country relative to the number of sister cities that relate that country to other countries?

Tests
{Which of the following cities were named after toponyms? - Tolyatti + Tula - Chernyakhovsk + Kurilsk + Vologda - Obninsk
 * type="[]"}

{Which of the following flags are belonging to these cities: Nizhnevartovsk, Petropavlovsk-Kamchatsky, Neftekamsk, Karabulak?} + + - + - +

{Which of the following cities were founded more than 400 years ago? + Moscow - Sarov + Kazan + Astrakhan + Samara + Voronezh
 * type="[]"}

Check yourself:
 * 1) cities named after toponyms
 * 2) flags of cities
 * 3) founded more than 400 years ago

Addon

 * 1) Total number of sister city statements per country

Links


Программирование Викиданных/Города