Artificial neural network/Training

Introduction
Neural networks learn (or are trained) by processing examples, each of which contains a known "input" and "result," forming probability-weighted associations between the two, which are stored within the data structure of the net itself. The training of a neural network from a given example is usually conducted by determining the difference between the processed output of the network (often a prediction) and a target output. This difference is the error. The network then adjusts its weighted associations according to a learning rule and using this error value. Successive adjustments will cause the neural network to produce output that is increasingly similar to the target output. After a sufficient number of these adjustments, the training can be terminated based on certain criteria. This is known as supervised learning.

Such systems "learn" to perform tasks by considering examples, generally without being programmed with task-specific rules. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as "cat" or "no cat" and using the results to identify cats in other images. They do this without any prior knowledge of cats, for example, that they have fur, tails, whiskers, and cat-like faces. Instead, they automatically generate identifying characteristics from the examples that they process.

Learning Tasks

 * (Learning and Training) Define properties that you associate with learning of children, training in sports, ... What are similarities and difference to the concept of training of neural networks.
 * (Error Function and Gradient Descent) Let $$(w_1, \ldots, w_n) \in \mathbb{R}^n $$ a vector of parameters, which define the state of a neural network. A single value defines e.g. the weight between connections between neurons in the ANN. The error is dependent on these values, so the error function maps $$(w_1, \ldots , w_n) \in \mathbb{R}^n $$ to a non-negative error $$E_{\mathbb{D}}(w_1, \ldots , w_n) \geq 0 $$. The error is dependent on the training data $$\mathbb{D}$$. Explain how the Gradient Descend Method can be used to reduce the error $$E_{\mathbb{D}}(w_1, \ldots , w_n) \in \mathbb{R}_0^{+} $$ (see Backpropagation Networks ).