3.5 DenseNet

Densely Connected Convolutional Networks

Created Date: 2025-05-18

Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections - one between each layer and its subsequent layer - our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers.

DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less computation to achieve high performance.

3.5.1 Introduction

Convolutional neural networks (CNNs) have become the dominant machine learning approach for visual object recognition. Although they were originally introduced over 20 years ago, improvements in computer hardware and network structure have enabled the training of truly deep CNNs only recently.

The original LeNet5 consisted of 5 layers, VGG featured 19, and only last year Highway Networks and Residual Networks (ResNets) have surpassed the 100-layer barrier.

As CNNs become increasingly deep, a new research problem emerges: as information about the input or gradient passes through many layers, it can vanish and "wash out" by the time it reaches the end (or beginning) of the network. Many recent publications address this or related problems. ResNets and Highway Networks bypass signal from one layer to the next via identity connections.

Stochastic depth shortens ResNets by randomly dropping layers during training to allow better information and gradient flow. FractalNets repeatedly combine several parallel layer sequences with different number of convolutional blocks to obtain a large nominal depth, while maintaining many short paths in the network.

Although these different approaches vary in network topology and training procedure, they all share a key characteristic: they create short paths from early layers to later layers.

3.5.2 Related Work

3.5.3 DenseNets

Consider a single image \(x_0\) that is passed through a convolutional network. The network comprises \(L\) layers, each of which implements a non-linear transformation \(H_l(\cdot)\), where \(l\) indexes the layer. \(H_l(\cdot)\) can be a composite function of operations such as Batch Normalization (BN), rectified linear units (ReLU), Pooling, or or Convolution (Conv). We denote the output of the \(l^{th}\) layer as \(x_l\).

ResNets

Traditional convolutional feed-forward networks connect the output of the

3.5.4 Experiments

3.5.5 Discussion

3.5.6 Conclusion