Research in programming Wikidata/Schools

School — this is an educational institution for general education. The article is devoted to the study of schools on the basis of the Wikidata.

SPARQL queries are used to study schools. These queries work with Wikidata objects, which have a school type. A list of all the existing schools, which are described in the Wikidata, has been built. In the article there is an analysis of the completeness of the Wikidata containing information about schools, which shows the comparison between official country data with Wikidata. Also there are a map of the location of Russian schools and a linear diagram, which shows the number of famous students of each school. Information about this schools is presented in the Wikidata. Basically this map shows that schools are located in Moscow and St. Petersburg. According to the linear diagram, most schools have two famous students.

Instances of the "School"

 * Object: school (Q3914).
 * Property: instance (P31).

Let's build a list of all schools.

SPARQL query, 18615 results.

Examples of instances of the object "School", which are fully completed:
 * Conservatoire National des Arts et Métiers
 * Connecticut College
 * Greenville College

Examples of instances of the object "School", which were poorly completed in the past:
 * Derzhavin Lyceum
 * Gorchakov Memorial School
 * Classical high school number 1 named after V. G. Belinsky

Completeness of Wikidata
According to the Russian Federal State Statistics Service (Rosstat), as of 2016 there were 42,600 schools in Russia. According to SPARQL query, there are only 82 schools in Russia. Here we see a huge difference between the data of Rosstat and the Wikidata.

The same situation can be observed for any country. For example, according to one more SPARQL query, there are 20 schools in the USA. This is the incorrect data (according to statistics - 28,220 schools as of 2008).

It is difficult to estimate the completeness of Wikidata because huge number of schools (4629), presented in Wikidata, don`t belong to any country (SPARQL query). Such schools account for about 25% of the total number of schools in the Wikidata. This fact does not allow to attribute the school to any country and compare the official country data with Wikidata.

Filling out Wikidata
It was decided to fill the property "pupils (P802)" of Russian schools.

This property shows a list of famous students, who is noted in history. Let`s consider the Derzhavinsky lyceum as an example. A famous student of this lyceum is G.I. Shirshina, a Russian political and public activist.

Let`s determine how many Russian schools did not have the property "pupils (P802)" before the task was completed. According to SPARQL query, there are 82 schools without famous students.

Also, seven Russian schools with famous students (according to Wikipedia data) and an unfilled property "country(P17)" were found on Wikidata. Therefore at first the property "country (P17)" was filled with these objects. After that the number of Russian schools with famous students increased by 7 and became equal to 89.

During the filling of the property it was discovered that a couple of schools had a false property "country (P17)", which equals to Russia. This has been fixed. After that the number of Russian schools is decreased.

Also there were filled out the property "Label" in English for famous students connected with the property "pupils (P802)".

The result of work: the property "pupils (P802)" in 45 Russian schools has been filled. Other schools did not have famous students or did not describe in Wikipedia. Let`s get the list of Russian schools with the filled property "pupils (P802)" using the following script:

SPARQL query, 43 objects, i.e. 43 russian schools with famous students.

Let's construct a linear diagram of the number of famous students of each such school:



The diagram above shows the division of schools into three groups: one famous student (five schools), two famous students (36 schools) and three famous students (one school: school number 1212).

Let`s get a map of Russian schools with famous graduates with the help of the script below:



Basically on the map above schools are located in Moscow and St. Petersburg. There are also many schools near Rostov-on-Don.

Thus, it was not possible to find information about famous graduates for 41 Russian schools. (SPARQL query)

Future work

 * 1) Output the name of the country with the largest number  schools that have a logo.
 * 2) Display a map with the schools marked on it, existing for more than 200 years.
 * 3) Construct the graph of the  domain zones of the official websites of schools.

Exercises
{When were the following schools founded in Russia? +-- Derzhavin Lyceum -+- Gorchakov School --+ Classical Gymnasium No. 1 named after VG Belinsky
 * type=""}
 * 1995| 1999| 1786

{Which of these schools has the largest number of well-known graduates? + School No.1212 - The Chekhov Gymnasium - Grammar school № 36 (Rostov-on-Don)
 * type=""}

{Enter the name of the country where most schools are located. { United States of America|America }
 * type="{}"}

SPARQL queries with replies:
 * SPARQL query number of schools in different countries, 160 results.
 * SPARQL query the number of famous students in various schools in Russia, 43 results..
 * SPARQL query the date of the establishment of various Russian schools, 32 results.