Research in programming Wikidata/Operating systems

The article explores the object of the "operating system" and its properties. The following problems were solved in the paper with the help of SPARQL queries: finding instances of the object "operating system", building a list of operating systems (OS) by base, by creation time, by programming language, in which the OS was written. Also a histogram is constructed, it shows the number of programs written in some programming language, and the proportion of how many of them work for some OS. A lot of software does not specify the programming language on which it was developed. The property "programming language" was added to several objects to improve the results. Wikidata plays a big role in software documentation.

Instances of the object "operating system"

 * Objects: operating system(Q9135)

Let's build a list of all the operating systems.

SPARQL query 510 results (January 2018), 1086 results (September 2020).

[+] > The most complete and detailed operating systems on Wikidata are: Linux, Windows, Windows 8

[-] > Almost empty and less informative operating systems are: SPIN, JavaOS, Atari TOS, Xubuntu

According to ProWD the only one Russian operating system on Wikidata is Miraculix, which has 7 properties. The leaders in terms of the number of properties (24 properties) among operating systems around the world are Microsoft Windows and Windows 8.

List of operating systems by base
SPARQL query 159 results (January 2018), 118 results (September 2020).

The query shows relation between OS and it's base.

List of operating systems by creation time
SPARQL query 298 results (January 2018), 238 results (September 2020).

Count of operating systems by programming language
SPARQL query 35 results (January 2018), 37 results (September 2020).

The query shows (only on the basis of the completed wikis, so it's not a fact that it's true) that the OS is predominantly written in Assembler language, which is certainly true, because it is the fastest, yet convenient programming language. On the second and third places are C and C++, which are not the worst analogue, because in spite of its "slowness", they are the most convenient and simple programming languages.

The programming languages used to write the operating system
It is also interesting to look at the results of this query in the form of a graph, it is also perfectly visible on it how many objects simply have an empty field "programming language".

SPARQL query 533 results (March 2017), 1117 results (September 2020)

If you look at the same query, but with such a restriction that at least the number of operating systems written in the language is at least 2, you can see a significant difference with the result of the previous query.

SPARQL query 118 results (October 2020)



Completeness of the Wikidata
According to information from the site www.operating-system.org, there are about 611 operating systems [1] (not including Linux distributions, which number exceeds the number of operating systems themselves). SPARQL query told us only about 510 operating systems. And if you look through a large number of objects from the query, it becomes clear that many of them are not very well filled, or even completely empty. From this observation we can conclude about the incompleteness of the wikidata.

List of operating systems and languages in which they are written
To get a list of the operating systems (OS) links and the programming language used to create it, you can run the following query SPARQL query 147 results (September 2020).

Software and operating systems on which they are used
The amount of software can be regarded as an indicator of the importance of the OS. The more OS users, the more software vendors will want to provide their products to such an audience. Hence the conclusion suggests itself: the more software is written for the system, the more significant it is. This request shows which software is supported by which OS.

SPARQL query 5738 results (January 2018), 30184 results (September 2020).

To get the most popular operating systems for software developers, you can modify the previous request in this way

SPARQL query

As you can see, the priorities for developers are: Linux. Microsoft Windows, Ubuntu.

A number of programming languages were used to create software for the operating system
SPARQL query 2259 results (December 2018), 6883 results (September 2020). The request shows for each software for each OS in how many languages it is written

Cartesian product of OS and languages with software and languages
SPARQL query 5336 results (January 2018), 18976 results (September 2020).

How much software was written using a language for an OS written using a programming language
SPARQL запрос 418 results (January 2018), 829 results (September 2020). The query shows that most of the software written for OS written in C/C ++ is also written in C/C ++. On the whole, it can be seen that most of the software is written in C, C ++, Python, Java, ObjectiveC.

How many software was written for the operating system using a language
SPARQL query 378 results (January 2018), 671 results (September 2020). The query shows that most of the software written for macOS is written in C ++, C, Python, for Android - in C ++ and Java, for iOS - in C ++.

How many software has been written in one or another programming language, and which part of them works under a particular operating system
The histogram shows how much software was written in a particular programming language, and which part of them works under a particular operating system SPARQL запрос 378 results (January 2018), 671 results (September 2020).



The histogram in the figure allows you to see for each programming language the number of programs that were written on it, and for which operating systems these programs work. It can be seen from the graph that the largest number of programs is written on С(1084), С++(1598), Java(526), JavaScript(242), Objective C(252), Python(454).

Let's look at each of these languages in more details.

Most of the programs which are written in C are for macOS(472) and Linux(235). The language was developed in 1972, but it still does not lose its popularity because, probably, it is using to write low-level applications.

Most of the programs which are written in С++ are for macOS(780), Linux(265) and Android(264). Probably, C++ will lead for a long time, because at the moment it is using for solutions that require high performance, which is not allowed by high-level languages like Java or C#.

Most of the programs which are written in Java are for macOS(196) and Android(156). Probably, Java is popular due to code portability, i.e. the Java code will be run on any machine in which the JVM is installed.

Most of the programs which are written in JavaScript are for macOS(100) and Android(60) и iOS(40). It is using to write the client side of web applications, it reduces server load and increases application speed.

Most of the programs which are written in ObjectiveC are for macOS(112) and iOS(72). Some time ago, if was especially using by the Apple corporation.

Most of the programs which are written in Python are for macOS(212) и Linux(107). It is a high-level language, has a low entry threshold. It is using, for example, to write web applications and data analysis.

Looking at the histogram, we can conclude that each of these languages has taken its "region" in the field of software development and is used for a certain range of tasks. It is also seen, that most of the programs are for macOS(2388), Linux(895) or Android(908). 

Completeness of the Wikidata
Let's compare queries 2 and 3. Оbviously that a lot of software products don't have "programming language" property.

Filling in the Wikidata
After filling the "programming language" field in 100 software products, query 3 shows 2502 results, 06.11.2017 01:40.

Software documentation
Wikidata plays a big role in software documentation. This is illustrated by the programs included in the GNOME and KDE. This article shows that while the English Wikipedia describes almost all the programs included in GNOME and KDE, the Italian and French ones only contain a subset of the articles. Documenting large projects is a well-known and difficult task. To solve it, you need a centralized system. It is in this role that the bunch of Wikipedia and Wikidata acts.

Future work

 * 1) Show all those OSs that have a "logo (P18)" property.


 * 1) Construct a diagram reflecting the statistics of how many OSs was created in which country. Шt is permitted to use properties developer, country или headquarters location, if creator is a company, or country of citizenship, if creator is an individual developer.


 * 1) Count how many OSs was created (inception (P571)) in 1995.

Exercises
{Specify the relation between the OS and its developer: +-- Newton OS -+- JavaOS --+ Ubuntu Touch
 * type=""}
 * Apple | Sun Microsystems | Canonical Ltd.

{ Select desktop image of OS Fuduntu: } - - + -

{ Select the OS based on which most other OSs were created: } + Debian - Android - Ubuntu - Linux kernel

1. SPARQL query, OSs and developers

2. SPARQL query, OSs and logos

3. SPARQL query, OSs and countries

4. SPARQL query, OSs and count of "descendants"

=References=