영상처리/딥러닝

[RFA 논문 리뷰]Residual Feature Aggregation Network for Image Super-Resolution

sumiin 2022. 8. 3. 16:54
반응형
SMALL

Absrtact

  • existing methods neglect to fully utilize the hierarchical features on the residual branches.
  • To address this issue, we propose a novel residual feature aggregation (RFA) framework for more efficient feature extraction.
  • The RFA framework groups several residual modules together and directly forwards the features on each local residual branch by adding skip connections
  • To maximize the power of the RFA framework, we further propose an enhanced spatial attention (ESA) block to make the residual features to be more focused on critical spatial contents. 

1. Introduction

  • Most existing CNN-based models do not make fully use of the information from the intermediate layers.
  • Especially, residual learning is widely used in CNN-based models to extract the residual information of input features, while almost all the existing SR models only use the residual learning as a strategy to ease the training difficulty.
  • Usually, a SR model is made by stacking a bunch of residual modules, where the residual features are fused with the identity features before propagating to the next module result, later residual blocks can only see the complex fused features.
  • To address these problems, we propose a residual feature aggregation (RFA) framework, which aggregates the local residual features for more powerful feature representation. 

(a) shows a common network design where multiple residual modules are stacked together to build a deep network.

hese highly representative residual features are used very locally, which limits the representational power of the network.

 

(b), the proposed RFA framework reorganizes the stacked residual modules, where the last residual module is extended to cover the fist three residual modules to ease the training difficulty

 

 

  • It is necessary to enhance the spatial distribution of residual features with spatial attention mechanism so that the performance of our RFA framework could be further improved
  • However, existing spatial attention mechanisms in image SR are either less powerful or computationally intensive.
  • . To solve this issue, we propose a lightweight and efficient enhanced spatial attention (ESA) block.
  • The ESA block enables a large receptive field by the joint use of a strided convolution and a max-pooling with large window size.
  • To verify the effectiveness of the proposed methods, we build a very deep network RFANet by combining the RFA framework with the ESA block.

 

In summary, the main contributions of this paper are as follows:

  • We propose a general residual feature aggregation (RFA) framework for more accurate image SR.
  •  We propose an enhanced spatial attention (ESA) block to adaptively rescale features according to the spatial context. 
  • We propose a residual feature aggregation network (RFANet) which is constructed by incorporating the proposed RFA framework with the powerful ESA block.

 

 

2. Related Work

 

Attention-based Networks

  • It can be interpreted as a way to bias the allocation of available resources towards the most informative parts of an input signal
  • Recently, some attention-based models are also proposed to further improve the SR performance.

3. Methodology

 

 Basic Network Architecture for Image SR

a basic image SR network usually consists of three parts: the head part, the trunk part and the reconstruction part.

 

 

The head part is responsible for initial feature extraction with only one convolutional layer

where H stands for the shallow feature extraction function of the head part.

Then the extracted feature F0 is sent to the trunk part for deep feature learning

 

 

The trunk part is made up of T base modules (BM), which can be formulated as where Bt denotes the t-th base module function.

Finally, the extracted deep feature Ft is upscaled through the reconstruction part

where ISR is the super-resolved image, R denotes the reconstruction function and G denotes the function of the SR network

The key module of the reconstruction part is the upscale module, where appropriate number of sub-pixel convolutions are applied.

 

 

Residual Feature Aggregation Framework

Residual learning has demonstrated its significance for the image classification problem.

Recently, residual learning is also introduced in image SR to further boost the performance

 

Each residual module consists of two branches:the residual branch (i.e. residual block) and the identity branch.

 

 

In the task of image SR, the residual block can produce some useful hierarchical features focusing on different aspects of the original LR image.

we propose a residual feature aggregation (RFA) framework to make a better use of the local residual features.

the residual features of the first three blocks are sent directly to the end of the RFA module and then concatenated together with the output of the last residual block.

Finally, a 1×1 convolution is applied to fuse these features before the element-wise addition with the identity feature.

 

Compared with the way of simply stacking multiple residual modules, our RFA framework enables nonlocal use of the residual features.

The useful hierarchical information that preceding residual blocks contain can be propagated to the end of the RFA module without any loss or interference, thus leading to a more discriminative feature representation.

 

 

Enhanced Spatial Attention Block

In order to maximize the effectiveness of our RFA framework, it is best to be used in conjunction with the spatial attention mechanism, since we need the residual features to be focused on spatial contents of key importance.

 

The ESA mechanism works at the end of the residual block (Fig.4(Left)) to force the features to be more focused on the regions of interest. We can get a more representative feature when aggregating these highlighted features together.

 

In the design of an attention block, several elements have to be carefully considered.

First, the attention block must be lightweight enough since it will be inserted into every residual module of the network. Second, a large receptive field is required for the attention block to work well for the task of image SR.

 

the proposed ESA mechanism starts with a 1 × 1 convolutional layer to reduce channel dimensions, so that the whole block can be extremely lightweight.

Then to enlarge the receptive field we use one strided convolution (with stride 2) followed by a max-pooling layer.

 

The combination of strided convolution and max-pooling is widely used in image classification to quickly reduce the spatial dimensions at the beginning of the network

 

Put aside the amount of calculation, a potentially better way to implement the spatial attention block is to use the Non-Local block.

 

 

Implementation Details

We apply the RFA framework with the ESA block to build our final SR network (RFANet).

 

4. Experiments

 

Combination with Residual Block

In this section, we investigate the combination of our RFA framework with the basic residual block used in EDSR.

Different from the original residual block used in image classification, EDSR removes the Batch Normalization layers and achieved substantial improvements.

Our RFA model adopts 30 RFA modules to keep the number of residual blocks the same as EDSR-Baseline for a fair comparison. → “RFA-EDSR"

We attribute this considerable improvement to the effective design of our RFA framework where the residual feature in each residual block can be better utilized by the network.

These comparisons demonstrate that the proposed RFA framework is essential to very deep networks for Image SR.

 

Combination with Dense Block

The motivation behind dense block is also to combine hierarchical cues available along the network depth to get richer feature representations.

It is reasonable to apply the RFA framework in conjunction with the dense block to further improve the performance.

We refer to the dense block baseline model as “Dense-Baseline”.

RFA framework with dense blocks (RFA-Dense)

 

 

Combination with Attention Block

By using attention mechanism, the performance of image SR has achieved significant improvements.

By using channel attention block alone, the PSNR already achieves 32.56 dB, which demonstrates the excellent performance of channel attention mechanism.

the RFA framework is best to be used with spatial attention mechanism.

the proposed RFA framework can further boost the performance of spatial attention mechanism by a large margin.

 

 

Effects of Residual Feature Aggregation (RFA)

We now illustrate how our residual feature aggregation design affects the output features in different stages of the network.

(1) The aggregation layers spread their weights over all the residual blocks which indicates that all the residual features are directly used to produce the output features of the RFA module.

(2) The variance of weight norms in latter modules are larger than that of the previous modules. This indicates that the network gradually learns to distinguish the residual features and assign more weights to the features of critical importance.

(3) At the beginning, the last block contributes most than the other three blocks. With the depth increases, the other three blocks also play an important role in feature learning, indicating the necessity of residual feature aggregation

 

Effects of Enhanced Spatial Attention

 

(1)The attention mechanism has the effect of modulating the activation values.

(2) Feature maps after the attention mechanism tend to contain more negative values, showing a stronger effect of suppressing the smooth area of the input image, which further leads to a more accurate residual image.

 

Conclusions

The RFA framework effectively groups the residual blocks together, where the features of local residual blocks are sent directly to the end of the RFA framework for fully utilizing these useful hierarchical features.

To maximize the power of the proposed RFA framework, we further design an enhanced spatial attention (ESA) block to make the residual features to be more focused on spatial contents of key importance.

To compare with state-of-the-art methods, we propose the RFANet by applying the RFA framework in conjunction with the ESA block.

반응형
LIST