Convolutional Neural Networks (CNN) and Their Variants

Depthwise Convolution
A convolution is performed on each channel separately, with no interaction between channels

input_channel = output_channel = group  

Pointwise Convolution

No convolution is performed within channels, but there is interaction between channels

kernel_size=1  

Depthwise Separable Convolution

First depthwise convolution, then pointwise convolution

Group Convolution

Spatially Separable Convolution

Spatially separable convolution performs one-dimensional convolution operations on the input feature map in both horizontal and vertical directions respectively, obtaining the final output feature map. This decomposes the original two-dimensional convolution into two one-dimensional convolutions, greatly reducing the amount of computation and number of parameters. It is also more beneficial for extracting features similar to “lines.”

Dilated Separable Convolution (Dilated/Atrous Convolution)

Parameters remain the same, but the receptive field becomes larger

Transposed Convolution

Information reconstruction, enlarges input height and width, commonly used in semantic segmentation
(e.g., image super-resolution?)

1D Convolution

Processes sequential data (time, text, etc.)

3D Convolution

Processes three-dimensional data, such as height+width+depth — CT scans, or height+width+time — videos