[Back]


Doctor's Theses (authored and supervised):

R. Sorschag:
"Intelligent Video Annotation and Retrieval Techniques";
Supervisor, Reviewer: H. Eidenberger, A. Scherp; Softwaretechnik und Interaktive Systeme, 2012.



English abstract:
Videos are an integral part of current information technologies and the web. The demand for
efficient retrieval rises with the increasing number of videos, and thus better annotation tools are
needed as today´s retrieval systems mainly rely on manually generated metadata. The situation
is even more critical when it comes to user-generated videos where rough and inaccurate annotations
are the common practice. Attempts to employ content-based analysis for video annotation
and retrieval already exist, but they are still in an infant stage compared to the retrieval of web
documents.
In this work, we address the use of object recognition techniques to annotate what is shown
where in videos. These annotations are suitable to retrieve specific video scenes for object related
text queries, thought the manual generation of such metadata would be impractical and
expensive. A sophisticated presentation of the retrieval results is further exploited that indicates
the relevance of the retrieved scenes at a first glance. The presented semi-automatic annotation
approach can be used in an easy and comfortable way, and it builds on a novel framework
with following outstanding features. First, it can be easily integrated into existing video environments.
Second, it is not based on a fixed analysis chain but on an extensive recognition
infrastructure that can be used with all kinds of visual features, matching and machine learning
techniques. New recognition approaches can be integrated into this infrastructure with low
development costs and a configuration of the used recognition approaches can be performed
even on a running system. Thus, this framework might also benefit from future advances in
computer vision. Third, we present an automatic selection approach to support the use of different
recognition strategies for the annotation of different objects. Moreover, visual analysis can
be performed efficiently on distributed, multi-processor environments and the resulting video
annotations and low-level features can be stored in a compact form.
We demonstrate the proposed annotation approach in an extensive case study with promising
results. A video object annotation prototype as well as the generated scene classification groundtruth
are freely available to foster reproducible research. Additional contributions of this work
consider the generation of motion-based and segmentation-based features and their use for specific
annotation tasks, such as the detection of action scenes in professional and user-generated
video. Furthermore, we participated at the two tasks instance search and semantic indexing of
the TRECVID challenge in the three consecutive years 2010, 2011, and 2012.

German abstract:
Videos are an integral part of current information technologies and the web. The demand for
efficient retrieval rises with the increasing number of videos, and thus better annotation tools are
needed as today´s retrieval systems mainly rely on manually generated metadata. The situation
is even more critical when it comes to user-generated videos where rough and inaccurate annotations
are the common practice. Attempts to employ content-based analysis for video annotation
and retrieval already exist, but they are still in an infant stage compared to the retrieval of web
documents.
In this work, we address the use of object recognition techniques to annotate what is shown
where in videos. These annotations are suitable to retrieve specific video scenes for object related
text queries, thought the manual generation of such metadata would be impractical and
expensive. A sophisticated presentation of the retrieval results is further exploited that indicates
the relevance of the retrieved scenes at a first glance. The presented semi-automatic annotation
approach can be used in an easy and comfortable way, and it builds on a novel framework
with following outstanding features. First, it can be easily integrated into existing video environments.
Second, it is not based on a fixed analysis chain but on an extensive recognition
infrastructure that can be used with all kinds of visual features, matching and machine learning
techniques. New recognition approaches can be integrated into this infrastructure with low
development costs and a configuration of the used recognition approaches can be performed
even on a running system. Thus, this framework might also benefit from future advances in
computer vision. Third, we present an automatic selection approach to support the use of different
recognition strategies for the annotation of different objects. Moreover, visual analysis can
be performed efficiently on distributed, multi-processor environments and the resulting video
annotations and low-level features can be stored in a compact form.
We demonstrate the proposed annotation approach in an extensive case study with promising
results. A video object annotation prototype as well as the generated scene classification groundtruth
are freely available to foster reproducible research. Additional contributions of this work
consider the generation of motion-based and segmentation-based features and their use for specific
annotation tasks, such as the detection of action scenes in professional and user-generated
video. Furthermore, we participated at the two tasks instance search and semantic indexing of
the TRECVID challenge in the three consecutive years 2010, 2011, and 2012.

Created from the Publication Database of the Vienna University of Technology.