Publication Entry

[Back]

Diploma and Master Theses (authored and supervised):

F. Groh:
"Semi-Automatic Video Annotation Tool for Generation of Ground Truth Traffic Datasets";
Supervisor: M. Gelautz; E193 Institut für Visual Computing und Human-Centered Technology, 2020; final examination: 2020-08-17.

English abstract:

In the context of this diploma thesis a semi-automatic annotation tool ("CVL Annotator") for the generation of bounding box ground truth data in videos was designed, implemented and evaluated. The goal is the annotation of night-time traffic scenes, which are only to a limited extent contained in publicly available reference datasets. Efficient semiautomatic annotation is an important basis for the development of machine learning methods that require large amounts of ground truth data for both training and testing of network architectures. A literature review conducted at the beginning of the thesis documents that there is a particular lack of sophisticated night-time traffic videos with highly dynamic lighting conditions including reflections from headlights and halos of oncoming traffic. Furthermore, it is shown that existing annotation tools mostly rely only on linear interpolation as a support mechanism for manual annotation. In contrast, the newly developed annotation tool "CVL Annotator" is equipped with a number of different state-of-the-art tracking algorithms. The selection of the trackers is based on a quantitative analysis of the algorithms using an existing synthetic dataset. The user interface was developed with the premise of minimizing user interaction and visualizing all information relevant to the user at a glance. A preliminary user study was conducted, comparing the newly developed annotation tool with an already published annotation tool ("Scalabel"). In particular, the time required and the number of clicks needed to create ground truth annotations of video traffic scenes were determined. Additionally, the accuracy of the annotation results was compared. Improvements could be achieved in the time and click analysis as well as in the accuracy study.

German abstract:

Im Rahmen dieser Diplomarbeit wurde ein semi-automatisches Annotationstool ("CVL Annotator") für die Generierung von bounding box ground truth Daten in Videos entworfen, implementiert und evaluiert. Das Ziel ist die Annotation von nächtlichen Verkehrsszenen, welche in öffentlich vorhandenen Referenz-Datensätzen nur in beschränktem Ausmaß enthalten sind. Effiziente semi-automatische Annotation bildet eine wichtige Grundlage für die Neuentwicklung von Verfahren des maschinellen Lernens, welche große Mengen an ground truth Daten zum Trainieren und Testen der Netzwerk-Architekturen benötigen. Eine am Beginn der Arbeit durchgeführte Literaturrecherche dokumentiert, dass insbesondere ein Mangel an anspruchsvollen nächtlichen Verkehrsvideos mit hochdynamischen Lichtverhältnissen inklusive Reflexionen von Scheinwerfern und Lichthöfen des Gegenverkehrs besteht. Weiters wird aufgezeigt, dass bestehende Annotationstools zumeist nur auf lineare Interpolation als Unterstützungsmechanismus für die manuelle Annotation setzen. Im Gegensatz dazu wird das neu entwickelte Annotationstool "CVL Annotator" mit einer Reihe verschiedener state-of-the-art Tracking-Algorithmen ausgestattet. Die Auswahl der Tracker erfolgt auf Basis einer quantitativen Analyse der Algorithmen unter Verwendung eines bestehenden synthetischen Datensatzes. Die Benutzeroberfläche wurde mit der Prämisse, Benutzerinteraktionen zu minimieren und alle für den Benutzer relevanten Informationen auf einen Blick zu visualisieren, entwickelt. Es wurde eine Vorstudie zur Benutzbarkeit durchgeführt, welche das neu entwickelte Annotationstool mit einem bereits veröffentlichten Annotationstool ("Scalabel") vergleicht. Dabei wurden insbesondere der Zeitaufwand und die Anzahl der benötigten Klicks, die für die Erstellung von ground truth Annotationen von Videoverkehrsszenen erforderlich sind, ermittelt. Zusätzlich wurde die Genauigkeit der Annotationsergebnisse verglichen. Es konnten sowohl in der Zeit- und Klickanalyse, als auch in der Genauigkeitsstudie Verbesserungen erzielt werden.

Keywords:

video annotation; object tracking; ground truth; user interface; Videoannotation; Objektverfolgung; Ground Truth; Benutzeroberfläche

Electronic version of the publication:

https://publik.tuwien.ac.at/files/publik_290027.pdf

Related Projects:

Project Head Margrit Gelautz:
CarVisionLight

Created from the Publication Database of the Vienna University of Technology.