Research in programming Wikidata/Artolela-web

Artolela-web is an educational game, the aim of which is to improve language skills. Artolela-web helps users in learning languages and allows to expand the outlook, as during the game the user gets acquainted with the masterpieces of painting.

About application
Artolela-web is a web application based on a collection of pictures from the Wikimedia Commons and their description on the Wikidata site, which includes the names of pictures in English, Russian and German. The choice of languages ​​was based on the preferences of the program's developers. In another version of the game the other set of languages could be selected, as the Wikimedia contains descriptions of pictures in many languages, for example, Italian, Bashkir, Ukrainian.

Consider the picture: . The image file is stored on the Wikimedia Commons. The Wikidata Database contains a description of the picture: the name of the picture in Dutch and its translations into Russian, English and other languages, location, artist, size of the picture (picture's width and height) and more. In this application only the picture names in Russian, English and German were used.

The application makes the opportunity to choose languages and the number of levels, to look through correct and incorrect answers and the nnumber of correct answers. In the future, new languages ​​will be added, new pictures will be added to database from the Wikimedia Commons, registration in the game will be added and statistics of the user will be saved. A student of the Institute of Mathematics and Information Technologies of Petrozavodsk State University Yulia Ipatova is working on this project.

The development languages is PHP, CSS and Javascript.

The interface language is English.

Application description
The application is designed for testing the language skills in the game form.

Game content

 * 1) At the beginning of the game the user selects two languages (the first language for the picture name and the second language for possible answers) and a level (the number of pictures, whose names the user needs to guess).
 * 2) Then a picture and four variants of the picture name in the first language are shown to the user. The picture name in the first language is hidden from the user.
 * 3) The user should select the correct variant of the picture name in the first language.

Game script

 * 1) The user select two languages.
 * 2) The user select the number of levels.
 * 3) The user click the button "START".
 * 4) The user is shown the picture, four possible answers and the picture link on the Wikimedia Commons.
 * 5) The user select one of the possible answers.
 * 6) The user click the button "NEXT".
 * 7) After the user has passed all levels, he sees his mark (the ratio of the correct answers to the total number of questions) on the screen.
 * 8) If the user click the button "NEW GAME", points 1-8 are repeated.

Application architecture


The application consists of 5 basic forms and the database, which includes three tables.

Database
The database contains three unrelated tables.
 * The table en_de contains the data of the pictures, which has the label in English and German languages on Wikidata at the same time. The table includes five atributes:
 * the picture ID (EDid),
 * the picture name in English (EDlabel_en),
 * the picture name in German (EDlabel_de),
 * the image filename on Wikimedia Commons (EDp_name),
 * the link to the picture on Wikimedia Commons (EDurl).
 * The table en_ru contains the data of the pictures, which has the label in English and Russian languages on Wikidata at the same time. The table includes five atributes:
 * the picture ID (ERid),
 * the picture name in English (ERlabel_en),
 * the picture name in Russian (ERlabel_ru),
 * the image filename on Wikimedia Commons (ERp_name),
 * the link to the picture on Wikimedia Commons (ERurl).
 * The table ru_de contains the data of the pictures, which has the label in Russian and German languages on Wikidata at the same time. The table includes five atributes:
 * the picture ID (RDid),
 * the picture name in Russain (RDlabel_ru),
 * the picture name in German (RDlabel_de),
 * the image filename on Wikimedia Commons (RDp_name),
 * the link to the picture on Wikimedia Commons (RDurl).



The example of the table en_de for the picture :

The example of the table en_ru for the picture «Starry Night»:

The example of the table ru_de for the picture «Starry Night»:

Note 1:  Tables are redundant because url = concat('http://commons.wikimedia.org/wiki/Special:FilePath/', StringEscape(p_name)). This redundancy speeds up the application because we get the data from the database, and we do not need further process of this data and spend time on this process.

Note 2: Before entering the URL in database, you need to escape signs "%".

There are two approaches to create a structure of this database:
 * to create one table that would store all pictures that have a label at least in two of three laguages. If the label in one of the languages misses for a particular picture, then the corresponding field in the table would remain blank;
 * to create three tables for each pair of languages.

The first approach allows to get rid of redundancy in the database, and significantly reduce its volume. However, requests to the database require more time, since the developer have to add a condition to search only those records that contain a label in the first and second languages.

The second approach leads to redundancy in the database, since if the picture has the label in all three language, it repeats in all tables. However, in this case, requests to the database require less time, since there is no need to add a condition to search only those records that contain the label in first and second languages.

Within the framework of this application, it was decided to create three different tables to speed up the word with the database and, accordingly, to accelerate the application.

Forms




The application consists of 5 forms:
 * 1) The opening form of the application. This form sets the application preferences (languages and the number of levels). Elements of the form  are:
 * 2) *two blocks, each of which consists of three radiobuttons for selecting languages,
 * 3) *one slider for selecting the number of levels,
 * 4) *one button to begin the game,
 * 5) *one block containing a link to the description of the game.
 * Note 1: It is necessary to check that the user has chosen different languages as the native language and the foreign language. If the user has chosen the same languages, the user will see the appropriate message and the application will automatically select another foreign language.
 * Note 2: The user may select from 1 to 50 levels. The step of the scale is one picture.
 * 1) The description form of the application. This form has the detailed description of this game. Elements of the form are:
 * 2) * the text block for displaying the game description,
 * 3) * one button for moving to the opening form.
 * 4) The game form for application is the main screen of the game. Elements of the form are:
 * 5) * one text block for displaying the number of the level,
 * 6) * one block for displaying the picture,
 * 7) * one block with the link to the picture on the Wikimedia Common as "W" letter,
 * 8) * one block, which contains four radiobuttons for displaying possible answers,
 * 9) * one button for moving to the next level,
 * 10) * one button for starting the game again.
 * Note: If the picture can not be downloaded from the server, a blank image will be displayed instead of it.
 * 1) The result form of application. Elements of the form are:
 * 2) * one text block for displaying the results of the game,
 * 3) * one button for moving to the opening form,
 * 4) * one block, which contains the link for viewing correct and wrong answers.
 * 5) The answer form of application. This form shows all user answers. Elements of the form are:
 * 6) * the text block for displaying the number of the level,
 * 7) * one block for displaying the picture,
 * 8) * the text block for displaying the picture name,
 * 9) * one block containing the list of possible answers,
 * 10) * one button for moving to the result form.
 * All elements, without the button, repeat as many times as the levels were chosen.

Random selection
It was necessary to generate the IDs of the picture's data and the answers which we get from the database. For this purpose, the function rand(int $min, int$ max) of the language PHP was used. $min is the smallest value that can be returned and $max is the largest value that can be returned. This function returns a pseudo-random number in the range from $min to $max. The parameter $min is 1, since the numeration of strings in the database starts at 1; the parameter $max is the number of pictures for chosen languages.

Pictures preparation
The application uses pictures that are stored on the server. Before uploading images to the server, they need to be downloaded and optimized.

SPARQL
Picture's description, which used in the application, are got from the Wikidata. Let's consider an example of filling the table en_ru. The picture is saved in the application database only if it has a label field in English and Russian and if there is a link to the image on the Wikimedia Commons (image (P18)). To get all the pictures that satisfy these conditions, a SPARQL-скрипт was used, thanks to which 3035 entries were received.

Get links
After the list of pictures is received, it is necessary to receive links to files with images of these pictures. To do this, the SPARQL script was exported to a JSON file using standard SPARQL query tools.

Upload images
To download a large number of files, the Download Master program was used. This program was chosen because it has the function of downloading files by importing a list of URLs. Another advantage of Download Master is the ability to export a list of downloads containing links to images stored on the Wikimedia Commons and the names of the downloaded files into an XML file.

Note 1: Using Download Master, it was possible to extract links to pictures from the JSON file without its preliminary processing.

''Note 2: Since there was a lot links to the pictures, there were problems with downloading them by Download Master. The main problem is the speed of reading the file by the program. While the file is being read, the computer may hang. If the computer's power is not enough, you need to divide the JSON file into several files. This can be done manually or with the help of several SPARQL scripts and keywords LIMIT and OFFSET. The keyword LIMIT specifies the maximum value of the decisions that will result (that is, the maximum number of rows). The keyword OFFSET allows to not show the first n decisions as a result.''

Compress images
After loading images, the total size of images that have a label at the same time:
 * in English and Russian, was 14,9 Gb (average size of one picture was 5 Mb),
 * in English and German, was 20,0 Gb (average size of one picture was 4 Mb),
 * in Russian and German, was 7,23 Gb (average size of one picture was 10 Mb).

Pictures of a such large size can be downloaded from the Internet very slowly, because of which the user may have a feeling that the application "hangs". Therefore, it was decided to compress pictures to an acceptable size. To compress images FILEminimizer Suite was used, since it allows compressing images in batch mode and provides a good percentage of compression (the average percentage of compression is more than 90%). After compression, the total size of images that have a label at the same time:
 * in English and Russian, is 737 Mb (average size of one picture is 248 Kb),
 * in English and German, is 1,02 Gb (average size of one picture is 213 Kb),
 * in Russian and German, is 177 Mb (average size of one picture is 250 Kb).

Hosting
After compression, images were uploaded to a remote server.

Configuration

 * The PHP version: PHP version 5.4 and more
 * The Apache version: Apache 2.4 and more
 * The database model: MySQL 4.0 and more.

Application download
Download the project Artolela-web from the GitHub by the archive. Unzip it, e.g. to: D:\artolela-web-master Then you install the application on your own server. From your own server, you must delete the following folders:
 * json,
 * python,
 * sql.

Database creation
Open "Databases" on your server.

Create your own database, e.g. pictures

An example of installing a database under Unix
 mysqladmin -p -uroot create DATABASE_NAME mysql -p -uroot mysql

mysql> CREATE USER 'USER_NAME'@'localhost' IDENTIFIED BY 'your_password'; mysql> GRANT select ON DATABASE_NAME.* TO USER_NAME@'localhost'; mysql> FLUSH PRIVILEGES;

mysql> use DATABASE_NAME mysql> charset binary mysql> source .sql/pictures.sql 

PhpMyAdmin
Open your server control panel. Open PHPMYADMIN control panel.

Log in PHPMYADMIN, by entering your server login and password.

Data download
Select your database in PhpMyAdmin left panel. Open the tab "Import" in PhpMyAdmin upper panel.

Select file "sql/pictures.sql" in the directory, in which you unzip this project and click "Next". After some time this file will be processed, and the necessary tables appear in the database with the corresponding records.

File config.php
Open directory in which you unzip this project. Select file "config.php". Change this file, writig in it the server login login, server password, and name of database.

Results
Source code of the application is available at GitHub in the repository Artolela-web.