[Back]


Diploma and Master Theses (authored and supervised):

S. Herrmann:
"Object detection with Microsoft HoloLens 2";
Supervisor: I. Giannopoulos, M. Kattenbeck; Department für Geodäsie und Geoinformation, FB Geoinformation, 2021; final examination: 2021-11-30.



English abstract:
Object detection is a central task in computer vision, which tries to identify and locate objects in a scene. After the era of hand-crafted features and limited accuracy, 2014 AlexNet brought the attention to neural networks and deep learning. Accuracy increased and inference time decreased significantly and it came to a boom of 2D image-based object detection networks. Nowadays a huge number of 2D object detection algorithms are available, but they have one main drawback, they only provide two dimensional bounding boxes. This is an issue as a number of modern applications operate in the three dimensional world, therefore also require accurate 3D information. Mainly over the last decade 3D object detection algorithms evolved leveraging deep neural networks. This thesis focuses on comparing accuracy and inference time of 2D and 3D object detection algorithms, evaluating whether the additional information provided by 3D algorithms comes at the cost of lower accuracy and / or slower inference times.
To enable such an evaluation a comparable 2D - 3D dataset is created. Due to the availability of both a 2D and 3D measurement device and ease of use, head mounted Augmented Reality glasses are used for data acquisition. Special considerations are applied to the data acquisition setup, selected categories and environmental / lightning conditions, to allow detailed analysis of key properties of 2D respectively 3D object detection algorithms. Acquired data is further processed to clean data and annotate objects. The comparison is based on one representative object detection algorithm per domain, YOLOv3 is selected as 2D algorithm, VoteNet as 3D algorithm. Multiple sets of hyperparameters are tested and the best models are finally compared.
The evaluation shows that 2D and 3D accuracy results are with AP2D−50 = 0.94 and AP3D−50 = 0.96 very similar and 3D results are even slightly better. The main downside of 3D compared to 2D algorithms is the inference time which is - though achieving real-time - with 19.04ms respectively 1.55ms by a factor ten slower in this experiment. A detailed analysis per category and condition shows that the accuracy of the 3D algorithm mainly depends on the number of object points and their density, while 2D algorithms benefit most from large object size.

German abstract:
Object detection is a central task in computer vision, which tries to identify and locate objects in a scene. After the era of hand-crafted features and limited accuracy, 2014 AlexNet brought the attention to neural networks and deep learning. Accuracy increased and inference time decreased significantly and it came to a boom of 2D image-based object detection networks. Nowadays a huge number of 2D object detection algorithms are available, but they have one main drawback, they only provide two dimensional bounding boxes. This is an issue as a number of modern applications operate in the three dimensional world, therefore also require accurate 3D information. Mainly over the last decade 3D object detection algorithms evolved leveraging deep neural networks. This thesis focuses on comparing accuracy and inference time of 2D and 3D object detection algorithms, evaluating whether the additional information provided by 3D algorithms comes at the cost of lower accuracy and / or slower inference times.
To enable such an evaluation a comparable 2D - 3D dataset is created. Due to the availability of both a 2D and 3D measurement device and ease of use, head mounted Augmented Reality glasses are used for data acquisition. Special considerations are applied to the data acquisition setup, selected categories and environmental / lightning conditions, to allow detailed analysis of key properties of 2D respectively 3D object detection algorithms. Acquired data is further processed to clean data and annotate objects. The comparison is based on one representative object detection algorithm per domain, YOLOv3 is selected as 2D algorithm, VoteNet as 3D algorithm. Multiple sets of hyperparameters are tested and the best models are finally compared.
The evaluation shows that 2D and 3D accuracy results are with AP2D−50 = 0.94 and AP3D−50 = 0.96 very similar and 3D results are even slightly better. The main downside of 3D compared to 2D algorithms is the inference time which is - though achieving real-time - with 19.04ms respectively 1.55ms by a factor ten slower in this experiment. A detailed analysis per category and condition shows that the accuracy of the 3D algorithm mainly depends on the number of object points and their density, while 2D algorithms benefit most from large object size.

Keywords:
Augmented Reality, Hololens 2, Objekterkennung

Created from the Publication Database of the Vienna University of Technology.