Research in programming Wikidata/Anime

This chapter is dedicated to anime (Q1107) Wikidata object analysis. Using SPARQL queries executed on Wikidata objects of anime type, several tasks were accomplished. These include a list of seiyu (voice actors) and their number of roles, a line chart of seiyu who have acted in one or more anime, a directed graph connecting seiyu and anime they voiced and estimates of the ages of seiyu at the time(s) of voice work.

Anime objects
Anime is Japanese animation. It has its own marked visual style, but there are other features that are not so obvious. For instance, anime has a significantly wider variety of genres in comparison to American and European animation — from family and kids’ comedies to dramas, the latter of which are usually depicted with live actors in Western cinema.

Each anime has its own voice actors. From here on we will refer to the Japanese voice actors as seiyu. In Japanese animation the terms seiyu and voice actor are synonymous. The designation title will usually reference certain anime and associated manga (Japanese comics). In general, title is a term that includes various media products, from novels to films, that are of the same name and are based on one or the other.

In order to work with the anime list from Wikidata we need to use the anime object and the instance of property.

Let us retrieve the list of all anime titles, without taking the subclasses into account.

683 results in 2017 and 216 results in 2021.

There are many more anime objects in Wikidata, but they are not instances of anime but of its subclasses, for example, anime series.

Let us execute the following query in order to obtain the list of anime genres and the number of anime that correspond to these genres.

This classification of anime by genre is not perfect because it is significantly skewed toward anime television series: among the 4875 anime titles, 2984 are instances of the anime series genre (62.7%). Also, some subclasses correspond not to genres, but to particular anime (e.g. Evangelion).

We can visualize this distribution using Rawgraphs service (Fig. 1).



Let us retrieve the list of all anime titles that are instances of anime subclasses by using the following query:

4757 results.

Anime that have the most complete information on Wikidata are Gurren Lagann, Space Battleship Yamato, Project A-Ko.

There are also some anime with many missing properties, including Doraemon, The Animal Conference on the Environment, Assassins Pride.

According to a profiling of Wikidata using ProWD, Fullmetal Alchemist: The Sacred Star of Milos has the biggest amount of properties (24 properties) among all the anime titles in Wikidata.

List of seiyu ordered by their number of roles in anime
Naturally, there are multiple characters in anime. Accordingly, different seiyu give voice to them. Most seiyu have taken part in a number of anime, but some have even managed to work on several dozen titles. Talented seiyu are sometimes invited to voice different characters in one anime. Hiroshi Kamiya is one of the most popular seiyu. He has worked on more than 180 anime and earned many awards. Attack on Titan is one of the most famous anime with his participation in which he voiced Captain Levi, one of the main characters.

Let us create a list of seiyu ordered according to the number of anime voiced by them.

SPARQL query, 148 results (2017) and 2910 results (2021)

Line chart of number of seiyu who worked on one or more anime
We can create a line chart with seiyu plotted according to their total number of roles. The more anime seiyu have voiced, the farther to the right they are on the chart. We can use the following query to create the chart.

SPARQL-query, 13 results (2017) and 58 results (2021).

Figure 2 shows that the higher the number of voiced anime is, the lower the number of seiyu who attain so many roles. Line 4 of query above sets the limit at 71 anime as there are only a few seiyu who have worked on a larger number of anime, and expanding the line chart farther to the right would not be informative.

As Figure 2 shows, most seiyu have voiced only one anime during their life. On the chart, there are 254 such seiyu. However, seiyu is a profession to which people often devote their lives. The fact that many voiced only one role according to Wikidata seems to be a result of the incompleteness of the data set.



Directed graph that connects seiyu to anime they have voiced
Most of the seiyu give voice to multiple characters from different anime. Let us create a directed graph that connects seiyu to anime they have voiced using the following query.

SPARQL-query, 826 results (2017) and 496 results (2021).

The ?seiyu variable (line 7) contains an array of Wikidata objects that correspond to several seiyu including Bin Shimada and others. We picked only three seiyu for illustrative purposes as a graph including more seiyu would be unwieldy to read.

The BIND(IF(?toggle, ?anime, ?seiyu)) construction in line 11 determines the graph node type: if ?toggle is true, then the node corresponds to anime, and seiyu otherwise. The item label and the node color are determined in the same way in lines 12 and 13. Line 14 creates the edges linking the seiyu and anime nodes.

Figure 3 shows part of the graph for several famous seiyu.



Fullness of Wikidata
The list of anime of English Wikipedia contains around 1600 titles. But there are special websites dedicated to anime, such as Gogoanime online cinema which contain information about many more titles. At the time of writing, there were 10,072 anime on Gogoanime (74 pages of 136 titles each plus one page of 8), whereas Wikidata provides information for only about 4875 titles. In addition, we should take into account the rapidness of anime releases. As such, we can conclude that Wikidata does not reflect accurate information about anime (only 48.4% of titles are represented).

We cannot consider Gogoanime a reliable source (RS), but it can be used to analyze the incompleteness of Wikidata.

The query in Sect. 2 returned 2910 names of seiyu from Wikidata. The problem is that we searched only for seiyu, who have worked on anime. When we query the names of all voice actors, without the anime restriction, the resulting number increases by a factor of five (see Query 6.7). A significant increase in the number of results relative to the above-mentioned query reminds us that there are many more areas in the voice acting industry than just anime, for example, Western animation, podcasts and video games. This should be taken into account when forming queries.

SPARQL-query, 3965 results (2017) and 14742 results (2021).

The sunburst diagram, Figure 4, is one way to visualize the output of the query. Such a diagram allows us to see the voice actors who contributed the most to the voice acting industry.



Is the release date of anime available?
Fans of anime often want to know the release date of their favorite titles. Wikidata does not always contain complete information on release dates. Let us retrieve the number of anime of which the release date is not available using the following query.

SPARQL-query, 237 results (2017) and 2940 results (2021).

Release dates of 2940 anime out of 4875 titles on Wikidata are not specified, or 60.3%. In 2017, 237 of 683 titles (34.7%) did not have a release date.It seems, unfortunately, an increase in the number of values for a list is not always accompanied by quality property information.

Analysis of seiyu age at time of voice work
As for any other profession, seiyu are of a certain age when they work, voicing various anime. SPARQL and external data mining tools, like Python programming language, allows to estimate such ages using available Wikidata.

In order to obtain the input data for our study, we need to execute three SPARQL queries and export their output to .csv files. Next, these CSV files are used in a Python script that generates a chart. You can run Python programs on Google Colaboratory.

We can retrieve the lists of all seiyu and their birthdates from Wikidata with two following queries using the SERVICE command and the rdfs:label construction.

The scripts of the two queries differ in the following ways:
 * The label (name) of a seiyu is retrieved with the ?seiyuLabel variable in the first query (the SERVICE command is used to define the languages of output) and with the rdfs:label command in the second query.
 * In the first query, it is also necessary to follow the ?seiyuLabel with a GROUP BY parameter in order to connect seiyu objects with their labels.

SPARQL query, 2515 results (2021).

SPARQL query, 2515 results (2021).

Note that the script retrieves not only the release dates of anime movies (property P577), but also the start dates of the series (property P580).

Let us get the links between seiyu and the anime they have voiced.

SPARQL query, 27106 results (2021).

Analysis results can be shown as a histogram, Figure 3. To create it we will use Python libraries Pandas and Matplotlib. The script which generates the final histogram is published on GitHub.

The histogram displays age in years along its X-axis and the total number of roles dubbed by seiyu of this age along the Y-axis.



It is a fun fact that there are occasions on Wikidata when seiyu were born after the release date in which they performed. This issue is probably related to an absence of information on the new seasons of reboots of old anime series. For example, in 2021 such a situation happened with the anime series Sazae-san and the seiyu named Nobunaga Shimazaki. The seiyu was born in 1988, whereas the anime series’ initial start date is 1969.

Future work

 * 1) Find the 10 most popular anime released in the current year. Anime popularity is estimated by the number of articles in different language sections. For example, if an article about anime is present in English, Russian and Spanish Wikipedia, then its popularity score is three.
 * 2) Find five anime in which the greatest number of female seiyu are involved.
 * 3) Create a bubble chart of the distribution of anime by genre (including the number of anime in each genre) using the subclass property.
 * 4) Mark the voice actors’ places of birth on the map.
 * 5) Create a histogram or bubble chart of voice actor nationalities.
 * 6) Create a histogram of the number of released anime by year, or of the number of voice actors by year of birth.
 * 7) Create histograms similar to Figure 3, but taking into account the gender identity of the voice actor (one for males, one for females, and one for other).

Test
{ There are some anime: ■ Rave Master (Shan T Lao Fu Zi) ■ Tetsujin 28-go (Tetsujin 28-gou) ■ Grenadier (Grenadier) ■ Attack on Titan (Shingeki no Kyojin) Correlate the anime's data with the images below. +--- ---+ -+-- --+-
 * type=""}
 * 1 (Rave Master),|2 (Tetsujin 28-go),|3 (Grenadier),|4 (Attack on Titan)

{ There are anime: ■ Gurren Lagann (Tengen Toppa Gurren Lagann) ■ Steins;Gate (Steins;Gate) ■ Hellsing (Hellsing) ■ Elfen Lied (Elfen Lied) Years of the creation of anime are known: 2011, 2007, 2004, 2001. Arrange the anime's data in order of decreasing date of their creation (1st place is the newest anime, 4th place is the oldest one). -+-- Gurren Lagann +--- Steins;Gate ---+ Hellsing --+- Elfen Lied
 * type=""}
 * 1 place (2011),|2 place (2007),|3 place (2004),|4 place (2001)

{ About what anime this description is for?:  Brief description:  "And what will happen after death?" Countless generations of people asked this question ..."  Genres:  Drama, Action, Comedy, School  Seiyu (fem.):  Kana Hanadzawa  Publication date:  2005  Note: Punctuation and spaces signs are important, if there are any of them.  { Angel Beats! }
 * type="{}"}

=References=