Publication Entry

[Back]

Diploma and Master Theses (authored and supervised):

D. Fritz:
"Intuitive und markerlose Interaktion in einer mobilen Virtual Reality Anwendung auf Basis von RGBD-Daten";
Supervisor: H. Kaufmann, A. Mossel; Institut für Softwaretechnik und Interaktive Systeme, 2014; final examination: 2014-11.

English abstract:

The diploma thesis discusses touchless interaction techniques on handheld devices
to intuitively manipulate a virtual 3D scene that is presented to the user on the
handheld display. The robustness and performance of different solutions and combinations
to detect the user´s hand and the fingertips without markers are examined.
Therefore, the first methods focus on using the built-in RGB camera, while later the
built-in camera is combined with an additional depth sensor. Hand position and finger
gestures are used to select and move objects in the virtual scene. Furthermore,
the user´s head position is tracked to adapt the perspective of the virtual scene in
order to create a 3D impression on the device display.
The approaches use RGB data for hand segmentation, gesture detection and the
hand size or the maximum gray scale value of the hand for relative depth estimation.
In addition RGBD data is used for improved hand segmentation and absolute depth
estimation. To detect two different finger gestures or the palm of the hand, Haar-like
feature-based cascaded classifiers were trained. If the classifier is used to recognize
the palm, the finger gestures are determined by the amount of detected fingertips.
Therefore, different image processing operations are applied for hand segmentation
and its contour is used for fingertip detection. An already trained Haar-like feature
cascade classifier is implemented to detect the user´s face and obtain its 3D position
with relative depth estimation using the size of the face.
Within the diploma thesis an Android application is developed using OpenCV, Unity3D
und OpenNI. The hardware prototype rigidly connects the handheld device
with the depth sensor (Microsoft Kinect) to enable correct calibration and mapping
of the received RGBD data. The performance of the approaches for gesture recognition
were systematically evaluated by comparing their accuracy under varying
illumination and background. Based on this study, useful guidelines for developers
were derived to choose the appropriate technique for their mobile interaction task.
Furthermore, an experimental study was conducted using the detected finger gestures
to perform the two canonical 3D interaction tasks selection and positioning
and demonstrate the different characteristics of the depth estimation methods. Overall
the best result was obtained using RGBD data for finger gesture detection and
absolute depth estimation at the expense of latency.

German abstract:

Die Diplomarbeit behandelt berührungslose Interaktionstechniken auf mobilen Geräten,
um damit eine virtuelle 3D-Szene zu manipulieren. Es werden verschiedene
Lösungen und Kombinationen der Fingergestenerkennung ohne Hilfsmittel auf ihre
Robustheit und Performance untersucht, als Erstes ohne zusätzliche Hardware
und später mit einer Tiefenkamera als Erweiterung. Die Perspektive der virtuellen
Szene wird mithilfe der Position des Kopfs gesteuert, um einen 3D-Effekt zu erzeugen.
Durch Handposition und Fingergesten können Objekte der Szene selektiert
und manipuliert werden.
Für die Kopferkennung wird auf einen bereits trainierten Haar-Kaskade Klassifikator
zurückgegriffen und die 3D-Position anhand der ermittelten Größe des Kopfs
relativ geschätzt. Selbst erstellte Haar-Kaskade Klassifikatoren werden verwendet,
um auf Basis von RGB-Daten, das direkte Erkennen zweier verschiedener Gesten
oder die generelle Handflächenerkennung zu ermöglichen und deren 2D-Position
zu ermitteln. Wurde mittels Klassifikator die Handfläche erkannt, werden die beiden
Gesten anhand der Fingerspitzenanzahl ermittelt. Dafür wird mithilfe von Bildverarbeitungsfunktionen
die Hand segmentiert, die Kontur ermittelt und auf Fingerspitzen
hin untersucht. Für die 3D-Interaktion werden die fehlenden Tiefenwerte
durch verschiedene Ansätze (Größe der erkannten Handfläche, maximaler Grauwert
der Hand) relativ geschätzt. Zur Verbesserung des 2D-Trackings mit relativer
Tiefenschätzung wird Microsofts Tiefenkamera Kinect verwendet, um die RGBDaten
mit absoluten Tiefendaten zu erweitern (RGBD). Die RGBD-Daten sollen
die Handsegmentierung und 3D-Interaktion verbessern.
Im Zuge dieser Diplomarbeit wird eine Android-Applikation mithilfe von OpenCV,
Unity3D und OpenNI entwickelt. Der Prototyp besteht aus einem Rahmen, um eine
feste Verbindung der Kameras von Tablet und Kinect sicher zu stellen und eine
Kalibrierung bzw. Mapping durchführen zu können. Die Performance der implementierten
Ansätze zur Fingergestenerkennung wurde in verschiedenen Licht- und
Hintergrundsituationen systematisch evaluiert. Auf dieser Studie basierend wurden
Empfehlungen für Entwickler erarbeitet, zur Auswahl der geeigneten Methode für
ihre mobile 3D-Interaktion. Außerdem wurde eine experimentelle Studie durchgeführt,
um die 3D-Interaktion zu evaluieren und die verschiedenen Charakteristiken
der Methoden zur Tiefenschätzung zu zeigen. Dafür wurden mit den ermittelten
Fingergesten die beiden Interaktionsaufgaben Selektion und Positionierung durchgeführt.
Insgesamt wurde das beste Ergebnis mithilfe von RGBD-Daten zur Fingergestenerkennung
und absoluten Tiefenschätzung erreicht, allerdings zulasten der
Latenzzeit.

Keywords:

3D Interaktion, Handheld Mixed Reality, Markerless Interaction, Natural User Interfaces

Electronic version of the publication:

http://publik.tuwien.ac.at/files/PubDat_230880.pdf

Created from the Publication Database of the Vienna University of Technology.