영상처리/딥러닝

[SRGAN 논문 리뷰]Photo-Realistic Single Image Super-Resolution Using a Generative AdversarialNetwork

sumiin 2022. 7. 27. 09:58
반응형
SMALL
SMALL
반응형

1. Introduction

  • The highly challenging task of estimating a highresolution (HR) image from its low-resolution (LR) counterpart is referred to as super-resolution (SR).
  • The optimization target of supervised SR algorithms is commonly the minimization of the mean squared error (MSE) between the recovered HR image and the ground truth.

1.1. Related work

 

1.1.1 Image super-resolution

  • focus on single image super-resolution (SISR) and will not further discuss approaches that recover HR images from multiple images

1.1.2 Design of convolutional neural networks

  • Deeper network architectures have also been shown to increase performance for SISR
  • Powerful design choice that eases the training of deep CNNs is the recently introduced concept of residual blocks  and skip-connections
  • In the context of SISR it was also shown that learning upscaling filters is beneficial in terms of accuracy and speed

1.1.3 Loss functions

  • Pixel-wise loss functions such as MSE struggle to handle the uncertainty inherent in recovering lost high-frequency details such as texture
  • minimizing MSE encourages finding pixel-wise averages of plausible solutions which are typically overly-smooth and thus have poor perceptual quality
  • minimize the squared error in the feature spaces of VGG19 [49] and scattering networks.

1.2. Contribution

  • GANs provide a powerful framework for generating plausible-looking natural images with high perceptual quality
  •  

2. Method

  • In SISR the aim is to estimate a high-resolution, superresolved image I(SR) from a low-resolution input image I(LR)
  • Our ultimate goal is to train a generating function G that estimates for a given LR input image its corresponding HR counterpart.
  • In this work we will specifically design a perceptual loss a weighted combination of several loss components that model distinct desirable characteristics of the recovered SR image.

2.1. Adversarial network architecture

  • generator can learn to create solutions that are highly similar to real images and thus difficult to classify by D.
  • To discriminate real HR images from generated SR samples we train a discriminator network.

2.2. Perceptual loss function

  • The definition of our perceptual loss function l SR is critical for the performance of our generator network.
  • We formulate the perceptual loss as the weighted sum of a content loss (l SR X ) and an adversarial loss component as:

2.2.1 Content loss

  • However, while achieving particularly high PSNR, solutions of MSE optimization problems often lack highfrequency content which results in perceptually unsatisfying solutions with overly smooth textures
  • Instead of relying on pixel-wise losses we build on the ideas of loss function that is closer to perceptual similarity.
  • We define the VGG loss based on the ReLU activation layers of the pre-trained 19 layer VGG network

 

2.2.2 Adversarial loss

  • add the generative component of our GAN to the perceptual loss.
  • This encourages our network to favor solutions that reside on the manifold of natural images, by trying to fool the discriminator network. 

3. Experiments

 

3.3. Mean opinion score (MOS) testing

  • performed a MOS test to quantify the ability of different approaches to reconstruct perceptually convincing images4. 

 

4. Discussion and future work

  • The focus of this work was the perceptual quality of super-resolved images rather than computational efficiency
  • The development of content loss functions that describe image spatial content, but more invariant to changes in pixel space will further improve photo-realistic image SR results.

5. Conclusion

  • we have confirmed that SRGAN reconstructions for large upscaling factors (4×) are, by a considerable margin, more photo-realistic than reconstructions obtained with state-ofthe-art reference methods.

 

반응형
LIST