Research in programming Wikidata/Business enterprise

This article is devoted to the study Wikidata objects "commercial organizations". With the help of SPARQL queries, computed on the objects of the type "commercial organizations" in the Wikidata, the following tasks have been solved: maked a list with organizations by branches distribution in the form of a bubble chart, counted the quantity of organizations by countries, drawn the graph of existing organizations and their subsidiaries. Conclusions were drawn regarding the completeness of the Wikidata on this topic, including a map of the organizations of the world.

Instances of object "Business enterprise"

 * Objects: business enterprise (Q4830453)

Using the following queary we can get list of all commercial organizations.

SPARQL-query, 109383 Results

👍 > The most complete and elaborated business enterprise on the Wikidata are: Google, Apple, Microsoft

👎 > Almost empty and uninformative business enterprise on the Wikidata are: Pininfarina, ANHUI EXPRESSWAY COMPANY LIMITED, Futura et Marge

The defect of the resulting list is that objects turned out to be nameless on the Wikidata (No label defined). Let's try to get a list of organizations where "label" field will be non-empty.

SPARQL-query, 74556 Results

Distribution of organizations by industry
Each organization specializes some industry. In order to understand which industry, for example, is the most popular (that is, how many organizations work in this industry), we can build a diagram.

Type of result: bubble diagram.

Are used:
 * object business enterprise (Q4830453) (business enterprise),
 * property industry (P452) (industry).

SPARQL query, 864 Results.

After analysis of this diagram (Fig. 1), we can conclude that the number of organizations involved in a particular industry. It is possible to build a table based on the data obtained (make a list of the 5 most popular industries):



Let's answer the question: What and how many industries exist in Russia?

SPARQL-query, 60 Results.

It can be concluded that such industry as retail in Russia dominates over the rest, and very seriously. If the quantity of organizations in this area reaches 78, then in the next industry (automotive industry), only 13 organizations work.

For comparison, we can build a list of existing industries of some other country (for example, Norway).

SPARQL-query, 41 Results.

The dominant industry here is manufacturing (Q187939).

Number of organizations by country
Next query displays number of commercial organizations in each country in the world.

Are used:
 * object business enterprise (Q4830453) (business enterprise),
 * property country (P17) (country).

SPARQL-query, 198 Results

Organizations and their subsidiaries
It is necessary to build a graph from existing organizations, including subsidiaries.

Are used:
 * object business enterprise (Q4830453) (business enterprise),
 * property subsidiary (P355) (subsidiary).

SPARQL-query, 428 Results(edges).

The resulting graph of neighbors (Fig. 2) consists of hanging vertices and isolated vertices. It is necessary to construct a graph where these vertices are absent.



SPARQL-query, 55 Results(edges).

Fullness of the Wikidata
According to the category List of companies of Russia there are at least 208 commercial organizations in English Wikipedia in Russia. We can note that there is a rating of the largest companies of Russia that is listed. It can be concluded that even big organizations have not been included in this list, not talking about small and medium ones.

It is impossible to obtain relevant data on the number of commercial organizations, because their number grows every day, and information about them is not represented in the public domain. For example, the USRLE, which provides data for a fee.

The quantity of commercial organizations entered in the state register as newly created, in 2014 amounted 420.5 thousand, according to data on the site of the Federal Tax Service (FTS). In June, 2015 came into force orders of the Ministry of Finance of Russia that the data of existing organizations and information about them no longer applies in public. The data can be provided only to state authorities, local self-government bodies and so on. Therefore, it is not possible to obtain reliable data on the quantity of available organizations.

There is an opportunity to explore fullness with the help of the Wikidata. It is necessary to remember the total number of organizations (from the beginning) on the Wikidata (about 110 000, as their number is constantly growing). A typical user who has a general understanding of organizations may be interested to see how an organization looks or where it is located on the map.

To see how many organizations have an image (that is, the 'image' field is filled in), we need to write the following script.

SPARQL-query, 2913 Results.

It can be concluded that the number of organizations with the image is 2913. This is not so much, which indicates about incompleteness of information.

Let's build a table of (maybe) popular user requests for organizations (depending on who is interested in some things about the organization). Also, we sort it by descending the results.

The results of this table indicate that the quantity of necessary information about organizations is very small, considering their total number on the Wikidata.

There is an opportunity to investigate organizations in Russia too. We can try to get a list of organizations in Russia with the help of the Wikidata.

SPARQL-query, 577 Results.

There are 577 organizations that were output by the query. For example, the user wants to see how these organizations are located on the map. It is necessary to write a script.

SPARQL-query, 9 Results.

Result: very few records with geographic coordinates in Russia. We can get a map of organizations not only in Russia, but of all organizations in the world by using the following script.

SPARQL-query, 511 Results.

The result (Fig. 3), again, is very small, only 511 organizations. The quantity of organizations with location is even less than the total number of all organizations in Russia.



Analyzing the data obtained, it can be concluded that the information about organizations on the Wikidata are only partially filled. There is not enough information to do any definite conclusions about the organizations and their components. A small amount of information can be explained by the chaotic appearance and disappearance of organizations (it is not easy to survive in such conditions of competition and the existing economy). But the information even about such major organizations (Apple, Microsoft, Intel) is incomplete and needs to be improved (for example, the Intel organization does not have a motto on Wikidata).

Future work

 * 1) Output 20 organizations with the largest revenue.
 * 2) Output as a diagram how many commercial organizations are appear each year.
 * 3) What is the distribution of the quantity of commercial organizations by industry in different countries.

Test
{ The following commercial organizations are listed: Tele2, Lada, Aviakor, Uralmash. Correlate the organization's data with the images below. +--- ---+  -+--  --+-
 * type=""}
 * 1 (Tele2),|2 (Lada),|3 (Aviakor),|4 (Uralmash)

{ Such commercial organizations are known: MegaFon, Svyaznoy, EurosetEvroset, Sportmaster. Years of the creation of commercial organizations are known: 1992, 1995, 1997, 2002. Arrange the organization's data in order of increasing date of their creation (1st place is the oldest organization, 4th place is the newest one). ---+ MegaFon -+-- Svyaznoy --+- Evroset +---Sportmaster
 * type=""}
 * 1 place (1992),|2 place (1995),|3 place (1997),|4 place (2002)

{ Arrange countries in ascending order of the number of organizations (on the 1st place: least number of organizations): -+-- Sweden +--- United Kingdom ---+ USA --+- Germany
 * type=""}
 * 1 | 2 | 3 | 4

SPARQL-queries with answers: List of all organizations, List of all organizations with years of creation, List of all organizations In Russia with image, List of organizations by country in descending order