The purpose of this research is to generate colored images from coloring books using deep learning, and the task of creating new data from input data in this way is called a generation task. Research on generation tasks has been very active in recent years, and results have been achieved in various fields such as images, speech, and natural language.
So, this time, I will talk about the generation task, taking the image generation model as an example.
- Image generation by GAN
Generative adversarial network (GAN)  is one of the backgrounds behind the active research on generative tasks.
If you want to know the overview, types, and how to use GAN, please check the following article.
GAN is a deep learning model that processes generative tasks announced by Goodfellow et al.
This GAN model structure is often used in research on generative tasks in recent years. The image field is no exception to this, and there are various image generation models such as pix2pix  that performs general-purpose image conversion, StackGAN  that generates images from text, and CartoonGAN  that converts photos into anime style. To do.
First, take a look at the image in Figure 4. In fact, all these pictures were generated by a GAN called StyleGAN. The structure of StyleGAN that generates images with this amazing resolution and reality is as follows.
Below, I would like to talk about the characteristic parts of StyleGAN.
First, StyleGAN takes an approach called progressive growth to generate high-resolution images. Progressive growth means that in the training process of GAN, it is possible to generate high-resolution images by starting with low-resolution learning and gradually adding layers corresponding to higher resolutions to the model as the learning progresses. In Fig. 6, we start with 4×4 learning, then add 8×8 layers, and so on to finally generate a 1024×1024 image.
In addition, StyleGAN uses a normalization method called Adaptive Instance Normalization (AdaIN) . Looking at Figure 5, StyleGAN applies the vector w to each layer through AdaIN. This w is a non-linear transformation of the style determinant z, called the latent representation. In StyleGAN, style conversion of the generated image is performed by this AdaIN processing.
Figure 7 shows the result using two vectors w. The upper row is a generated image when the value of w used for generation is switched from w (hereinafter w_a) that generates image A at a low resolution to w (hereinafter w_b) that generates image B. Similarly, the middle row is the generated image when switching from w_a to w_b in the middle resolution generation stage, and the lower row is the generated image when switching from w_a to w_b in the high resolution generation stage.
From this result, it can be seen that the influence of each vector on the generated image changes depending on the timing of changing the two vectors. Also, although it does not affect the generated image as much as AdaIN, StyleGAN incorporates random noise into each layer.
In Fig. 8, it can be confirmed that random noise is affecting some parts of the generated image, such as hair.
StyleGAN performed very well in image generation. At the same time, however, there were also problems with the generation of noise called droplets (Fig. 9) and the problem that some of the features of the generated image became unnatural (Fig. 10). Therefore, StyleGAN2  improved StyleGAN and solved these problems.
First, StyleGAN2 solves the droplet problem by modifying the structure of AdaIN. In StyleGAN, normalization was performed using the average and standard deviation of the actual data by AdaIN. Believing that this is the cause of the droplets, the authors assumed the distribution of the data in StyleGAN2 and performed normalization using only the standard deviation, thereby realizing image generation without droplets as shown in Figure 11.
Next, StyleGAN2 does not use the progressive growing structure to solve the problem that some features are generated in an unnatural state. Instead, StyleGAN2 improves the expressiveness of the model by incorporating a skip structure such as residual networks9 into the network ( see also this article about residual networks ).
By eliminating progressive growing, StyleGAN2 can now generate images (Fig. 12) that are consistent with features such as eyes and teeth.
This time, we introduced StyleGAN and StyleGAN2 as generation tasks. As I mentioned at the beginning, research on generated tasks has become very active in the last few years. Therefore, research results of various generation tasks have been announced not only in the image generation introduced this time, but also in the fields of speech and natural language. I hope that everyone who reads this article will be interested in generation tasks in various fields as well as images.
Skill Up AI is currently offering a GAN (hostile generation network) course . In this course, you can systematically learn about various GAN derivatives centered on StyleGAN. There is also a free trial that allows you to watch part of the course, so please consider it. If you want to learn GAN from the basics of deep learning, please consider the deep learning basic course that can be used in the field .