User:Moeri/Human Computation

= Human Computation =

Overview
The preceding chapters provided detailed information concerning ranking and recommendation on the Web. This chapter breaks out in a new direction, introducing the area of human computation.

Human computation is a relatively new research area, studying the process of involving humans in tasks which machines cannot yet solve. Concerning the Web, naturally not every content can be created fully automatically by machines, but partly needs human input. One possible use case is image recognition and labeling, e.g., for improving image search on the web: it is a task a machine cannot easily solve but a human is commonly capable of.

The immense growth and nowadays importance of the Web is boosting human computation applications as it allows to reach millions of people - having access to the Internet - and encourage them to participate in these applications. By this means, the topic is of high interest while studying the Web and therefore is certainly a part of our course.

In this chapter, the area human computation will be examined in the broader context of collective intelligence and different human computation applications will be introduced with the common target to gain access to humans' cognitive capabilities for performing tasks. All these applications are based upon different incentives to motivate humans to participate.

Motivation
The continuous development of the Internet allows for global collaboration between humans and machines. The cost of collaboration thereby has been reduced to almost zero - humans and machines establish a “global brain”, also named collective intelligence.

Human computation is a subfield of collective intelligence, studying the process of involving humans in tasks which machines currently fail to solve. Concerning the Web, naturally not every content can be created fully automatically by machines, but partly needs human input. One possible use case is image recognition and labeling, e.g., for improving image search on the web: it is a quasi impossible task for a machine but commonly a very easy one for a human. Therefore, the idea is to embed human performance whenever necessary in terms of solving a task or improving an existing automatic technique.

Probably one of the most important persons to mention in the context of human computation is Luis von Ahn. He coined the term human computation and is as well kind of the inventor of Games with a Purpose – a topic that will be introduced just in the following. According to his thesis human computation can be defined as:

“[...] a paradigm for utilizing human processing power to solve problems that computers cannot yet solve.”

In consideration of human computation, one has to motivate humans to contribute and provide appropriate systems or applications. One possibility of gaining access to people for performing tasks is given by crowdsourcing marketplaces, where workers get a financial reward for solving tasks. Another category of human computation applications include games with a purpose, where players, while playing a game, generate content.

In the following, human computation will be examined in the context of collective intelligence and different incentives will be introduced. Afterward, different human computation systems will be introduced in detail: Amazon Mechanical Turk and reCAPTCHA as representatives of applications that make use of crowdsourcing, as well as ARTigo and Duolingo as Games With A Purpose. Ensuing the conclusion of this chapter is drawn and afterward, a quiz can be found to allow you to check your understanding.

Background
As stated in, human computation is partly a subfield of collective intelligence, as well as the areas crowdsourcing, social computing and data mining are – but all these terms are not synonymous with each other and in fact only overlap partially. All these terms will be described briefly in the following to provide an appropriate understanding. One might also take the flipped classroom session (15:30) into account.

Collective Intelligence is kind of the "all-embracing" field: it contains both the areas crowdsourcing and social computing and partly human computation. General speaking, collective intelligence includes "smart" actions performed by a group of people.

Social computing systems can be seen as a subfield of Web 2.0, including blogs, wikis and other online communities.

Unfortunately this chapter will not enlarge upon these areas. Interested readers lookup the glossar where each area is defined and assigned with a paper concentrating on it.

The crowdsourcing area will be described just in the following. Additionally one can also find detailed information in the Crowdsourcing Chapter.

Incentives for Human Computation
In opposition to machines, humans need to be motivated to solve a task - otherwise they are not likely to contribute in human computation applications. Thus providing appropriate incentives is of particular importance. In general, one can differentiate between monetary and nonmonetary incentives: a worker is achieving a financial reward for his work or he is doing it voluntarily.

Apparently, nonmonetary incentives include diverse different motivators:
 * altruism
 * competition
 * curiosity
 * fun
 * reciprocity
 * reputation

Also the willing of gaming the system and/or causing damage is apparently an incentive as well as – see reCAPTCHA – the incentive of “having no other choice”. That is, the user has to perform a human computation task in order to continue, e.g., get access to some online content. The human computation applications in the following sections will exemplify different motivators: while workers on crowdsourcing platforms get paid for their work, Internet users cannot avoid CAPTCHAs and games with a purpose use humans' "passion" to play games for fun and competition.

Crowdsourcing

 * See main article: Crowdsourcing

Crowdsourcing can be specified as the process of splitting a problem into several smaller tasks, outsource these tasks to humans, or say the crowd, and collect the retrieved results. Therefore, it is a mixture of the two terms “crowd” and “outsourcing”.

In the following, two specific human computation applications that make use of crowdsourcing will be described.

Amazon Mechanical Turk
Amazon Mechanical Turk (MTurk) is a paid crowdsourcing marketplace. A requester publishes his tasks and workers can decide to work on them for a financial reward. Some tasks can demand for human computation as image recognition, product descriptions or audio transcribing - in sum, all these tasks that ask for a human worker because a machine cannot solve them. Apparently, there can also be numerous tasks not particularly requiring a human to solve them: one has to memorize that crowdsourcing and human computation are not the same.

reCAPTCHA
reCAPTCHA is an application that improves accuracy of digitizing textual documents. It was originally devised by Luis von Ahn and others, using the incentive of having to solve some computation task, in order to get access to some online content. This can be for example the possibility to comment on a blog or register an email account. To protect these sites against spam and other forms of attacks, a CAPTCHA is used. It is a program to test whether a human or a machine is trying to access a site - the term CAPTCHA means Completely Automated Public Turing Test To Tell Computers and Humans Apart. One variant of a CAPTCHA is to show some distorted text, which only humans are able to read. By this, it is possible to catch machines and forbid them, e.g., to submit spam comments.

Now these CAPTCHAs are actually used to help digitizing documents by showing these specific words to humans, that were incorrectly scanned by an automatic tool. So humans are solving tasks which machines fail to solve. In conjunction with human computation, the accuracy can be improved clearly.

Citizen Science
Citizen Science projects are already known for a long term. However, the development of ICTs arises in growing possibilities concerning these projects. Citizen science terms ordinary persons, collecting or processing data in a scientific context. A well-known portal for citizen science projects is Zooniverse, where people can participate in the fields 'space', 'climate', 'nature', and 'biology'. Their first, and very popular, project is Galaxy Zoo.

Galaxy Zoo
Galaxy Zoo is an online platform where users are encouraged to classify galaxies. According to its website, there exist two main morphological classes. Via the Internet thousands of pictures of galaxies were shared with the users who were invited to classify the galaxies. A tutorial thereby helped the participants getting started. The project turned out to be quite successful as only in its first year more than 50 million classifications were uploaded by more than 150,000 participants.

By now, several succeeding projects exist, building on this successful start up.

Games with a Purpose
Games with a purpose (GWAP) are human computation applications designed using nonmonetary incentives, such as fun and competition between players. The goal of a GWAP is to create human generated content while the human is playing the game, thus it happens almost unconsciously. The central property of a game with a purpose is that people are not recruited because they want to help but because they enjoy playing the game. That is, the player does not even have to be interested in the purpose behind the game.

The most important person to mention concerning these games is Luis von Ahn. He is kind of creator of these games with his most popular and earliest game, the ESP game (see ESP.pdf for detailed information). The ESP game started in 2003 and was concerned with labeling images in a game-like environment. It was bought by Google and is offline by now. However, there exist several other similar games by now.

In the following, two games will be described exemplary. While ARTigo targets to gather image tags with human computation to improve image search on the web, Duolingo's users translate documents while learning a foreign language.

ARTigo
ARTigo is a game about tagging artworks. Since it is possible to digitize every artwork, there exist millions of copies. This leads to the impossibility of finding a specific picture, for example without knowing the artist, or a collection of pictures, for example pictures which are associated to a specific atmosphere.

Therefore, the target of ARTigo is to create tags for artworks to facilitate search. Apparently, this task cannot be performed by machines, since they are not able to recognize complex figures - not to mention contextual information.

ARTigo is a two-player game showing both players the same picture for a time interval of 60 seconds. Both players are typing in tags, that they associate with that picture. For instance a painting showing an old man might get tagged with "man", "old", and "aristocratic". For every tag mentioned at least once in some past game the player gets five points, whereas for every tag both current players agree on, both players get 25 points. Therefore, the strategy is to find tags that the other player uses as well. The game results in a set of labels for every used image, consisting of these tags that the most players used to describe the specific image. Nevertheless, e.g., homonyms still might lead to some difficulties, since the same term might have more than one meaning. But all in all, these tags can be used to improve image search on the web.

Duolingo
Duolingo is an application aiming for translating the web while its users are learning a foreign language. The goal is to gain natural sounding translations without grammatical errors.

Duolingo offers a game-like interface, containing lessons for beginners as well as embedded real-world documents which need to be translated. By now, Duolingo offers lessons in English, French, German, Italian, Portuguese and Spanish (Jan 2014).

The gaming atmosphere is intensified by additional gimmicks, e.g., users can practice against the clock, level up, and check their improvement on a scale as well as the one of their friends, arising in a kind of competition.

Summarized, the application offers requesters the opportunity to carry a document, e.g., a website to the population of Duolingo users and get its translation back in exchange for a financial reward. Meanwhile the users train their language skills.

Quality Assurance
Quality assurance is of high importance in the area of human computation. Though humans can work on tasks that are unsolvable for machines, they tend to be error-prone and thus, results can be wrong. Therefore one has to find ways to distinguish between correct and wrong results obtained from the crowd. Here a differentiation between unintended and willful wrong answers should be made clear. A preventive method for the former consists in clarifying tasks, such that no ambiguities remain concerning the claimed task. More precisely, the task description needs to be plain and simple, just as well as the answer options (if given).

Besides, there might exist human computation tasks that require a certain expert knowledge – thus one could think about a test that a participant has to pass first in order to get access to the task.

Another problem arising in payed marketplaces are workers who tend to make money as fast as possible, also called spammers. The direct consequences are workers whose results are imprecise or incorrect due to carelessness and oversights, and furthermore workers who willfully answer tasks wrong in order to only get through the task fast and get the money.

Above all, one has always to keep in mind that there might be participants in every human computation application who only want to game the system and cause damage. Detecting and filtering these people out therefore remains a key quality issue.

A known quality assurance method is redundancy, i.e., taking the answer as correct where the most people agreed on. Most applications make use of this principle: the answer where the most workers agree on is taken as the correct result of a task on a marketplace as Amazon Mechanical Turk, and tags where the most players agree on are taken into account for the result set for labeling an image in ARTigo for instance. However, this strategy is not always sufficient as stated in : the more complex a task gets, the more people might answer it wrong. Moreover, there might exist multiple correct results for a task, e.g., in translation tasks. That is, majority voting can lead to falsified results. Nevertheless, it remains a reasonable approach to handle quality.

An additional strategy to provide a certain level of quality assurance is known by CrowdFlower, a platform that allows publishing tasks simultaneously on different marketplaces (e.g., Amazon Mechanical Turk). CrowdFlower offers a quality assurance method known as golden units: a golden unit is a question where the solution is known in advance. For instance, a worker gets a task consisting of a number of questions to work on and one of these is such a golden unit question. Depending on the worker's answer to this question, one can try to assess the worker's quality in general. That is, if a worker already answers this question false, this might indicate that he is not reading and honestly working on that task at all. Therefore, this quality assurance method can help in detecting low quality workers.

Another possibility are reputation systems where every participant is assessed by the others (one can consider Amazon or eBay although not appropriate in this specific context). By this means, spammers can be identified. Moreover, one could design a system where high rated workers get a bonus in order to encourage workers to gain a high reputation score or the like. The same applies to games with a purpose, where a player's “quality” can be represented in a high-score list or in his individual “level progress”. Thus reaching a high position in the high-score or leveling up can serve as a motivator for players to play sincere.

Miscellaneous
Coming back to redundancy as a method of quality assurance, a readable blog entry can be found under. The author Panos Ipeirotis illustrates the “vicious cycle” consisting of spammers, who lead to the need of redundancy in order to provide quality, that in turn results in extremely low wages on Amazon Mechanical Turk. The author hereby brings the term “market for lemons” into play. This term was coined by George Akerlof in 1970 and defines “a market where the sellers cannot evaluate beforehand the quality of the goods they are buying”. Akerlof uses the market for used cars as an example, in which a deficient used car is colloquial called a “lemon”. Ipeirotis now maps this scenario onto Amazon Mechanical Turk: given good workers and low quality workers (“goods”) the requester (“seller”) is likely to pay a price proportional to the average quality of these workers, i.e., less than a good worker would earn for his solid work. Consequential, the good workers are likely to leave the market, which in turn leads to continuing crumbling of prices. As a result, only low quality workers remain and wages are extremely low – possibly preventing new good workers from joining the market.

Conclusion
In this chapter, human computation as a subfield of collective intelligence was introduced. There exist several human computation systems which rely on humans as “processors”, i.e., human computation. The goal is to channel these tasks to humans which machines currently fail to solve. Examples include image recognition and labeling, text summaries or translations and audio transcriptions. Whereas image recognition is not efficiently doable by a machine, text translation is certainly possible - but the result might be quite worse in contrary to human translations (e.g., concerning grammar or natural sounding).

Summarizing this, human computation is used to solve tasks that machines cannot solve yet and/or to improve automatic techniques.

Human computation applications have to provide appropriate incentives, otherwise humans are not likely to spend their time and effort. In this chapter, several different motivators were described, including monetary and nonmonetary ones.

In general, human computation systems benefit from the fact that everyone – having access to the Internet – can contribute. For this reason the results are kind of outstanding. Complex content can be generated by requiring only little time and effort from each participant.

The succeeding chapter will concentrate on crowdsourcing in a broader context, namely not restricted to human computation tasks but its full spectrum. The term crowdsourcing will be introduced in more detail, prodiving general background knowledge and representative examples.

Quiz
{Human Computation ..} - is equivalent to Collective Intelligence. - is equivalent to Crowdsourcing. + is a subfield of Collective Intelligence. - is a subfield of Crowdsourcing.

{Which statements are true?} - The intent of human computation is to replace machines. + Games with a purpose are human computation applications. - Crowdsourcing marketplaces are human computation applications. - Tim-Berners Lee invented human computation.

{A typical game with a purpose} - pays a reward to the winner + uses a player's input to solve (complex) tasks - forces people to generate some content - is invented by Luis von Ahn + is based upon incentives as fun and competition

Glossar
 Collective Intelligence  '' "[...] groups of individuals doing things collectively that seem intelligent." ''

Crowdsourcing   "[...] the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call."

Data Mining "[...] the application of specific algorithms for extracting patterns from data."

Social Computing  "[...] applications and services that facilitate collective action and social interaction online with rich exchange of multimedia information and evolution of aggregate knowledge"