Semantic Segmentation

Semantic Segmentation

Introduction

Shape is all you need...

Detecting obstacles (see Object Detection) is the first step in avoiding them. However, knowing the location and the rough size of an object is often not sufficient. Rather, we require an exact knowledge of its shape and boundaries. For example, if a robot described an archway with a bounding box, it would not be able to find the passable area. A segmentation would be able to distinguish between passable and impassable regions.

Image segmentation is defined as the task of partitioning an image into distinct groups of pixels. Often, we want these groups of pixels to be "meaningful" for being able to represent concepts in our world. The term Semantic Segmentation requires that these groups represent humanly interpretable objects and regions, e.g. furniture, floor, walls, people, cars, doors or other relevant objects for Path Planning algorithms to navigate unstructured environments.

We segment obstacles in industrial environments to enable path planning.
We segment obstacles in industrial environments to enable path planning.

We at GESTALT robotics train and deploy segmentation models to enable robots to interpret their environment. The models can distinguish a variety of objects and materials. Feel free to contact us in order to discuss your use case. Why not send us an email at info@gestalt-robotics.com or give us a call at +49 30 616 515 60.


Tech Outline

Deep Learning and Semantic Segmentation

Traditional Methods vs Deep Learning

Traditional approaches attempt to solve the segmentation task by grouping pixels of similar color intensities or identifying areas surrounded by edges. In addition, the shape could be incorporated. In contrast, the semantics or meaning of the target is not considered. While such approaches were sufficient for detecting static objects e.g. in medical images where it can be guaranteed that bones are viewed from a fixed perspective, images from autonomous robots pose a challenge due to a high variety of object appearances. Often, the targets are captured from different perspectives.

Recently, deep learning has overcome these limitations (Object detection). Thus, it was just a question of time until Deep Learning based algorithms started to beat traditional methods in Semantic Segmentation. Similar to object detection, the algorithms basically apply a trained deep classifier on each pixel region to determine its semantics.

Class Imbalance

Even more present than in classification problems, class imbalance is a huge challenge. Class imbalance means that there are more data samples of a particular object than of another. For instance, a training dataset could contain twice the amount of dog images in comparison to cat images. In consequence, the trained algorithm becomes biased, more often choosing the class label "dog". As each pixel in an image is a data sample in segmentation, smaller objects can lead to massive imbalance often by factor 10 or 100! We apply a variety of additional techniques in order to ensure that the model does not only choose the major label for the whole image.

In this exemplary case, pixels representing humans (highlighted in turquoise) only make up a tiny fraction of the whole image.
In this exemplary case, pixels representing humans (highlighted in turquoise) only make up a tiny fraction of the whole image.

Applications & Use Cases

Autonomous Driving & Automated guided vehicles (AGVs)

Semantic Segmentation is a vital prerequisite for the avoidance of obstacles and pedestrians. Segmentations combined with 3D images produce detailed object boundaries in 3D. Subsequent Path Planning algorithms can then generate a motion trajectory to avoid collisions.

Autonomous robots use semantic segmentation to localize, to find and to grab objects of interest.
Autonomous robots use semantic segmentation to localize, to find and to grab objects of interest.

Building Infrastructure Modelling (BIM)

In Building Infrastructure Modelling, we create a virtual 3D model of a building including its semantic regions. Such model can then be used to quickly calculate statistics, such as areas, number of doors or windows. 3D reconstruction systems provide highly detailed geometry. However, after scanning, tedious post-processing steps are required to "cut out" the objects of interest. Semantic segmentation helps automating this task.

This video demonstrates automated segmentation of doors in a RGB-D video. Notice that although shape and visible area of the doors vary, we are able to determine its boundaries.


Conclusion

Semantic Segmentation is a vital step in interpreting the environment. In addition to pure geometry, a segmentation adds a humanly understandable representation of the data. Its application areas lie specifically in cases where the knowledge of exact boundaries is mandatory. Get in touch with us to discuss your use case and how it can benefit from semantic segmentation and further processing. You can contact us under info@gestalt-robotics.com or give us a call at +49 30 616 515 60 – we would love to hear from you.