Markerless Tracking/Select

This page is devoted to the question how selecting the correct data-set could be realized in the context of Markerless Tracking.

The Problem
After generating a whole bunch of virtual data-sets it is necessary to decide which one matches the captured input data best. The match will likely not be 100% because we only generate a discrete number of data-sets and have a non-perfect data generator. Also optimization algorithms need good hints about where new promising samples should be generated. Therefore the comparison must result in a smooth and low frequency similarity function with their top at the perfect match.

Comparing Images
When do two images mostly look alike? Of course if they are the same. In this case their pixel colors will exactly match at all pixel positions. With most images also a small offset or missing part in one image will result in a high match of equal or mostly equal pixels. Therefore it's best to compute the mean distance of all pixels in color space:

$$ \mbox{distance} = \frac{ \sum_{i,j}^{image\ size} \left | \mbox{rendering}(i,j) - \mbox{capturing}(i,j) \right | }{       \#\mbox{pixels} } $$ (example source code)

This formula won't be good to compare non-realistic images like a one pixel black line on a white background. But in the camera tracking context we are dealing with areas of color.

Ignoring Unknown Pixels
In practical setups it is not possible and/or desirable to generate the whole real environment. Often it is sufficient to track an object on arbitrary background. We have come up with a trick in the image comparison for this case:

Only the object is rendered and all other pixels are marked invalid (with a special color or an alpha of zero). The comparison can now be carried out between valid generated pixels and the real image.

Other Algorithms
Most other popular image processing techniques (e.g. Edge-Detectors, SIFT-Features, ...) are not a good choice because they try to be independent about certain variables (rotation, color, scaling, ...). This will easily result in wrong similarity evidence. Maybe it is possible to find a way to read out the rotation an scaling info of the SIFT-matches and feed them back into the next round of data generation. This would be tricky to integrate with an optimization algorithm.