- Overview of U-Net
- Features of U-Net
- Application method of U-Net
- For those who want to learn more
In the previous article , we discussed segmentation in general.
In this article, I will explain U-Net , which has made a major breakthrough in the field of segmentation and is still widely used today.
U-Net is a model for semantic segmentation developed by Olaf et al. for biomedical purposes.
Figure 1 shows the structure of U-Net. As you can see in Figure 1, it is named U-Net because it looks like the letter “U”. This U-Net is a model consisting of encoders and decoders.
The U-Net encoder convolves the input image several times and extracts the features of the image. For the encoder, we can almost directly leverage the structure of models that have shown good results in image classification, such as ResNet.
At this time, the basic network structure is called “backbone”. This is not limited to U-Net, but is a general idea in object detection and segmentation.
If the amount of data for learning is small, the accuracy of U-Net can be improved by using a backbone trained with a dataset for image classification.
The U-Net decoder receives the features extracted by the encoder, performs the inverse process of normal convolution, called deconvolution, and outputs a probability map of the same size as the input image.
Enlarging the feature map by deconvolution is called upsampling. Upsampling is a process that converts a small feature map into a larger feature map, but simply upsampling does not capture the position information of objects well.
Therefore, in U-Net, we decided to connect the feature map of the encoder to the feature map of the decoder in each layer. The gray arrows in Figure 1 represent this connection.
As a result, the information of the large feature map on the encoder side is transmitted to the decoder side, making it easier to capture the position information of the object during upsampling.
The most important feature of U-Net is to concatenate the feature map of the encoder with the feature map of the decoder, as mentioned earlier. The feature maps generated by the encoder are copied (Copy), cropped (Crop), and concatenated (Concatenate) to the feature maps of the decoder.
Such feature map connections are commonly called “skip connections”. The purpose of skip connections in U-Net is to convey the information of the large feature map on the encoder side to the decoder side. This can improve the classification accuracy on a pixel-by-pixel basis.
Since U-Net was announced, various derivative methods have been proposed. Here are some of them.
U-Net++ is U-Net with dense connection between encoder and decoder.
Figure 2 shows the structure of U-Net++. The green part in Figure 2 is the decoder. In this way, by making the connection between the encoder and decoder dense, the feature map information held by the encoder can be successfully conveyed to the decoder.
U-Net++ is also called Nested U-Net due to the structure in which one U-Net contains a small U-Net.
R2U-Net is a model that introduces Residual structure and recursive structure often used in time series analysis etc. into U-Net. Residual structure is the basic structure of ResNet. The adoption of these two structures is the origin of the name (Recurrent Residual U-Net; R2U-Net).
Figure 3 shows the structure of R2U-Net. The looping arrows in Figure 3 represent recursive structures. Its structure is similar to U-Net, but part of convolution and deconvolution is recursive structure.
By making the convolution a recursive structure, we can run the convolution many times without increasing the parameters.
4-3. 3D U-Net 
3D U-Net is a 3D model of U-Net. Segmentation can be performed for 3D images such as MRI images.
Figure 4 shows the structure of 3D U-net. The overall structure is almost the same as U-Net, but it supports 3D images by using 3D convolution.
In this article, we introduced U-Net, which is widely used in the field of segmentation. In the deep learning basic course that can be used in the skill up AI field, you can learn about various deep learning methods, such as theoretical aspects and implementation using NumPy. Please consider taking the course.
Click here for segmentation series 1 “Introduction to segmentation using deep learning”
Skill Up AI holds a practical AI study session ” Skill Up AI Camp ” every Wednesday. At the study sessions, we will cover various practical themes and provide hints that will lead to improved practical skills in data analysis and AI development. There is also a corner where the instructor answers questions and concerns from the participants.