Research in programming Wikidata/Programming languages

We explore the properties of programming languages ​​based on the knowledge base of the Wikidata international project. Using SPARQL queries, computed on objects of the "programming language" type in Wikidata, a number of tasks have been solved. The list of all programming languages ​​under permissive licenses is received. A bubble diagram is constructed by the number of file formats. Maps, showing the place of formation of institutions and companies in which people, who were involved in the creation of programming languages ,​​studied or worked, are constructed. A list of all object-oriented programming languages ​​is obtained. The conclusion about the exhaustive completeness of Wikis relative to object-oriented programming languages is drawn.

Formulation of the problem
We study programming languages, in particular, information about them in Russian Wikipedia, English Wikipedia and Wikidata.

Tasks:
 * 1) Build a list of programming languages.
 * 2) Find the percentage of free to closed languages.
 * 3) Show on the map the place of study and residence of developers of programming languages.

Objects Used in SPARQL Queries

 * programming language — programming language;
 * object-based language — object-oriented programming language.

Properties Used in SPARQL Queries

 * influenced by — which programming languages have influenced;
 * developer — who developed the programming language;
 * copyright license — what license;
 * instance of — what more general type(s) is this language;
 * file extension — file extension;
 * headquarters location — location of the headquarters of the language developers;
 * coordinate location — geographic coordinates of the object;
 * educated at — where the object studied;
 * place of birth — where the object was born;
 * occupation — object's occupation (profession).

Instances of the "Programming Language" object
Let's build a list of all languages.

SPARQL-query, 732 results (2017), 1422 results (2020).

👍 The most complete and well-developed programming languages on Wikidata for 2017 were: Java, Python, C. For 2020 the most well-developed programming languages on Wikidata are: C++ (26 properties), Java (26 properties), JavaScript (25 properties), R (25 properties).

👎 Almost empty and uninformative languages for 2017 were: CLIPS, Dylan, Go!.

The disadvantage of the resulting list is that a number of objects turned out to be nameless on the Wikidata (No label defined). Let's try to get a list of languages, which "label" field will be non-empty.

SPARQL-query, 709 results (2017), 1422 results (2020).

There were two dozen less results in 2017, but all languages in 2020 have labels.

Demonstration of work with operations on sets in SPARQL
Output all programming languages that are open (free) software or influenced by at least one of the following programming languages: C, Python, Java and not developed by any of the companies, except: Sun Microsystems, Johnson Space Center.

SPARQL-query, 115 results (2017), 122 results (2020).

Permissive licenses
We will output all programming languages under permissive licenses (practically do not limit freedom of action of users of software and developers).

SPARQL-query, 37 results (2017), 82 results (2020).

There were, for example, CoffeeScript, Go, Haml, in this list of 37 "free" languages.

Consider the relationship between permissive and proprietary or closed-licensed languages. SPARQL-запрос, for 2020 the ratio closed languages to free is 25%.

Number of source file formats
Depending on the programming language, the source code files for programs may have different extensions. Let's construct a bubble diagram by the number of valid formats of the source code files.

SPARQL-query.



The figure shows that the most historically rich in formats and file extensions programming languages are C++ (10 formats), Geometric Description Language (8), Racket (7). For example, files with a program in the Racket language can have the extensions rkt, rktl, rktd, scrbl, plt, ss or scm.

By 2020, languages such as REXX (6 formats), Java (5 formats), Wolfram Language (5 formats), Raku (9 formats), Geometric description language (8 formats) have also started to take the lead.

Countries in which developers and organizations, associated with the creation of programming languages, live
Let's map the countries in which people and organizations, connected with the creation of programming languages, live. Noticing that the developer of the language can act both as an organization and as individuals. To determine the location (property: coordinate location) of the organization, we will use the coordinates of its headquarters (property: headquarters location), for the person - the coordinates of the place of his birth (property: place of birth).

SPARQL-query.



We will also construct a bubble chart to identify the most favorable countries for the emergence of people capable of developing programming languages and locating headquarters in these countries. We see in the figure that the most favorable countries were the United States (159 people and the headquarters of the apartments) and the United Kingdom (15). In Russia, only two programming languages were developed: Refal and the Embedded Programming Language 1C: Enterprise.

For 2020, the number of headquarters in the US is 241, in the UK - 24, in France - 18, and in Russia - 5.

Universities where people who developed programming languages studied
Let's display on the map educational institutions, in which students, who subsequently developed programming languages, studied.

SPARQL-query, 142 results (2017), 282 results (2020).

The map shows that most of the people involved in the creation of programming languages studied in Europe or the United States.

Let's construct a bubble chart for the most popular educational institutions, among future developers of programming languages. You can see in the figure that the first places were: Princeton University (8) and Stanford University (8). MSU was at the end of the list, Tony Hoare, who developed ALGOL60, and Valentin Turchin, who developed Refal, studied there. Moscow State University was included in this list, which includes 142 universities of the world.

Professions of the creators of programming languages
Let's construct a bubble diagram showing which professions prevail among people who develop programming languages.

SPARQL-query, 48 results (2017), 74 results (2020).



The most common professions were: a specialist in computer science, an engineer, a teacher. It is interesting to note that there are such professions as: jazz musician, politician (Herbert A. Simon). In 2020, among the developers of programming languages, there were the most specialists in the field of computer science (172 people), as well as 96 engineers, 57 teachers, 56 programmers and 43 mathematicians.

Object-oriented programming languages
In addition to the programming languages ​​themselves, Wikidata also describes programming paradigms. With the help of the script https://w.wiki/oLg and the illustration, you can see that by the number of programming languages, the most popular is object-oriented programming (399 languages ​​for 2020), followed by  procedural languages (297 languages ​​for 2020). It is worth noting that multi-paradigm programming (programming with the simultaneous use of multiple paradigms) is also represented by a large number of programming languages.

Let's list all the object-oriented programming languages.

SPARQL-query, 116 results (2017), 118 results (2020).

Thus, 16% of programming languages are object-oriented.

Fullness of Wikidata
According to the Bourabai Research University, there are at least 26 programming languages ​​that support an object-oriented paradigm. In the articles devoted to object-oriented programming, another 4 and 3 programming languages ​​are added to this list. The SPARQL-query returned 116 results. It is difficult to judge the completeness of the data in the three sources cited above, since there are a large number of little-known, obsolete and narrowly focused languages ​​that are not covered in authoritative sources. From this it can be concluded that Wikidata provides a fairly complete list of object-oriented programming languages.

Filling objects
Let's list all people who are involved in the development of programming languages and whose objects are filled with the 'label' field in English:

SPARQL-query, 133 results (2017), 223 results (2020).

For 2017 there are were 133 such results. We will derive a similar list, but with a filled-in 'label' field in Russian. There are 88 such results. Filling in the fields label and description in Russian for these objects and printing the result: SPARQL-query, 133 results (2017), 183 results (2020).

Future work

 * 1) Output all programming languages with the "mascot character" property.
 * 2) Calculate the number of programming languages founded before 1992 (property: "inception").
 * 3) Construct a bar chart that shows the number of known hashtags in Twitter for each programming language (property: "Twitter hashtag").
 * 4) Construct an ordered list of programming languages by the number of interlinks.
 * 5) Construct a list of languages by the number of visits of articles in Russian Wikipedia.
 * 6) Construct a  directed acyclic graph of dependencies of programming languages from each other (or find cycles in dependencies, if such a graph can not be constructed). See the "influenced by" property in Java.

Tests
{Relate the programming language and its developer. +-- Ada -+- Forth --+ Erlang
 * type=""}
 * J. Ichbiah | C. Moore | J. Armstrong

{ Select the logo of the programming language LOLCODE: } - - + -

{Fill the gaps. Fortran is in the first place in the number of its dialects. Their number reaches the order of { 8-12 _2 }. In the second place Lisp - { 6 _1 } dialects. The third place is shared by Standard ML and  Object Pascal with { 3 _1 } dialects.
 * type="{}"}

SPARQL queries with replies:
 * Programming languages and their developers.
 * Logos of programming languages.
 * Number of dialects in programming languages.

Links

 * Andrew Krizhanovsky, deniskk25. Где учатся и кем работают изобретатели языков программирования // Where are the universities in which the inventors of programming languages learn, and what are their professions (In Russian) // Nauchkor