[Back]


Doctor's Theses (authored and supervised):

N. Artner:
"Tracking Related Multiple Targets in Videos";
Supervisor, Reviewer: W. Kropatsch, H. Bunke; Fakultät für Informatik der Technischen Universität Wien; Institut für Computergrafik und Algorithmen; PRIP 186/3, 2013; oral examination: 2013-12-06.



English abstract:
This cumulative thesis presents research in the field of tracking. Tracking is one of the most
thoroughly researched problems in computer vision. The aim of tracking is to follow an object
of interest (target) in a video. In this thesis, I focus on a special problem: tracking related
multiple targets. Two important questions in tracking are: What is the target? and Where is
the target? The core contributions of this thesis answer these two questions with the help of
graph-based representations and methods.
The first core contribution is a fully automatic initialization for target models (What?), based
on the principal that things which move together belong together. The input of the approach is
a video showing the targets in motion. In this video a set of salient points is tracked to extract
the necessary motion information in the form of trajectories. A triangulated graph is built based
on the initial positions of the tracked points. Then, the triangulated graph is deformed based on
the motion encoded in the trajectories. This deformation of the triangulation over time is the
input of a hierarchical grouping process, which is realized by an irregular dual graph pyramid.
In the top level of the resulting pyramid the rigid entities (e.g. body parts of a human body)
are identified. Finally, the motion of these rigid entities is analyzed to find possible points of
articulation connecting them (e.g. upper and lower arm of a human).
The second core contribution is a novel approach for finding temporal correspondences of
multiple related targets (Where?). This thesis proposes to represent the targets by a graph model,
where each target is represented by a vertex and their relationships are encoded by edges. The
traditional solution to find the temporal correspondences of a graph model is graph matching.
In contrast to that, this thesis proposes a novel approach, which finds the correspondence of
each vertex (target) by combining the appearance cue of a simple tracker with the structural cue
deduced from a graph model. These two cues are combined in an iterative process inspired by
the well-known Mean Shift algorithm. The outcome are correspondences for all vertices and
edges in the graph, which locally maximize the similarity in appearance and locally minimize
the deviation from the structure encoded in the model.
Finally, the main goal of this thesis is to show the potential of graph-based representations
and methods in tracking. This goal has been achieved through these two core contributions.

Created from the Publication Database of the Vienna University of Technology.