Editing Internet Texts/Machine translation

Introduction
This project is dedicated to those interested in computer sience and translation who would like to learn about the basic assumptions and priniples of machine translation, as well as get aqcuainted with its history and newest inventions. The aim of this project is to present the complicated matters in a simple way so as to be understandable for those who do not possess any knowledge of this field.

What is machine translation?
Machine translation, also referred to as MT or automated translation, is a field of computational linguistics which uses software in order to translate texts from one natural language into another. Because of the globalisation there has been a growing demand for translating larger amounts of text in shorter time and hence the increasing interest in researching the field and improving the software.

Brief history
• 1949 – 65 The term "machine translation" first appeared in in Warren Weaver’s Memorandum on Translation (1949), and the research of the field began in 1951 at MIT with Yehoshua Bar-Hillel being the key figure. A research team from Georgetown University was the first to publicly present its system in 1954. The presentation was promising enough to grant substantial funding for further research in the United States and it gave rise to interest in MT research in other countries such as Japan and Russia. The first MT conference was held in London in 1956. In 1962 Association for Machine Translation and Computational Linguistics was formed in the United States and in 1964 the National Academy of Sciences formed a committee (ALPAC) to study MT.

• 1966 – 95 The prospects for MT were initially very enthusiastic but instead of progress the researchers encountered serious obstacles which they could not immediately overcome. Seeing that struggle ALPAC issued a report which stated that MT cannot equal human translation, therefore funding for MT research should be limited to bare minimum. Despite the financial problems the research continued and the first MT software was put to work by the French Textile Institute to translate abstracts from and into French, English, German and Spanish (1970). In 1971 Brigham Young University started a project to translate Mormon texts by automated translation, and in 1978 Xerox introduced Systran to translate technical manuals. Trados (1984) was one of the first MT companies, and the first commercial MT system for Russian/English/German-Ukrainian was developed at Kharkov State University in 1991.

• 1996 – 2016 In 1996 Systran offered free translation of small texts, and it was followed by numerous online networked services such as AltaVista Babelfish. MT started to be sold as a software for personal computers, mobile phones, as well as it is used in translating websites and electronic mail. The most recent innovation is Google Neural Machine Translation system from 2016.

Rule-Based Machine Translation
It is the first and simplest system. It uses large collections of rules, manually developed over time by human experts mapping structures from the source language to the target language. The human factor in rule-based systems helps deliver fairly good automated translations with predictable results. However, due to significant manual labor, rule-based systems can be quite costly, time consuming to implement and maintain and – as rules are added and updated – these systems have the potential of generating ambiguity and translation degradation over time.

The process
• Word for word translation

• Introducing language-specific rules

Statistical Machine Translation
The statistical model uses algoritms in order to compare all possible translations and chooses the best one based on statistics. Statistical models train on bilingual parallel corporas and while translating they generate numerous probable translations and compare them to the training data to estimate which translation is the most likely one. This process is much qicker and efficent than RBMT, however, if the bilingual data is not sufficient or of bad quality (“data-dilution effect”) the system is not able to procude a proper translation.

The Process
• Breaking the sentence into chuncks and translating word-for-word

• Creating sets of possible translations

• Choosing the most probable set Przeważającą większością Posłowie przyjęli rezolucję. The sentence, as it is, doesn't sound bad already, however, the training data would probably suggest that it would be more natural to say: Posłowie wyraźną większością głosów przyjęli rezolucję.

Neural Machine Translation
NMT is a relatively new model. The first to explore it was Google in 2014 and since then they have implemented it in Google Translate. NMT, similarly to Statistical MT learns on available data, however, it uses deep-learning in order to build an artifical neural network.

Jay Marciano compered Statistical Machine Translation to a game of chess in which players operate within a limited universe and make a limited number of moves. They calculate all possible moves to find the best one, just like SMT. When it comes to Neural Machine Translation it could be compared to playing the piano. Even if you make a mistake you can go back, solve the problem, and play the melody correctly. Neural MT systems are also not bound by such strict rules as in chess; they find their own way and find the best choices.

Neural MT is much more effective, however, it takes time for the models to learn. For this reason Google Translate, even with the model already implemented, still produces imperfect results. What differentiaties NMT from other systems is the freedom it has in finding patterns and clues. They are not told what to look for, they do it themselves. Another major difference is its ability to translate directly from one language to another despite not having much training data. The older systems usually used English as a mediating language, but NMT is capable of translating e.g. Polish to Korean.

Generic
Generic MT doesn't focus on any particular type of translation, so e.g. Google uses this solution so as to be able to translate anything. Both businesses and individuals may use it for translating short and simple texts. Generic MT engines use a great amount of data, therefore the translations they produce are often faulty and contain serious syntax mistakes.

Customised
Customised MT translates texts belonging to a specific domain, industry or organization. Such translations are of higher quality provided that the data they train on is not dilluted.

Enterprise
Enterprise MT focuses more on reproducing style, format, as well as terminology in a better and more faithful way in order to adhere more closely to the language conventions and corporate language, as it is mostly aimed at and used by global businesses.

Quiz
{ The newest invention in Machine Translation is: { Neural Machine Translation }.
 * type="{}"}

{ What kind of corpora is used by Statistical Translation engines ? { Bilingual }.
 * type="{}"}

{ What was the name of the association which investigated MT? { ALPAC, Association for Machine Translation and Computational Linguistics }.
 * type="{}"}

{ Google Translate is categorised as: { Generic, Generic engine, Generic translation }
 * type="{}"}

{ If the data is not sufficient or faulty it is: { data-dillution effect }
 * type="{}"}

Task
Translate the sentence I want to go to the prettiest beach into your native language in all the four translation engines:

Google Translate Bing Free Translation Imtranslator


 * Compare the translations
 * Decide which one is the best
 * Determine the types of mistakes (if there were any)
 * Think about the possible reasons why the engine might have made mistakes