Deep Learning With PyTorch

This eBook not only focuses on the explanation of theoretical knowledge, but also pays more attention to engineering practice. By combining a large number of practical cases, especially how to train, optimize and deploy models, readers will be able to master how to use PyTorch to complete various deep learning tasks.

You need to have basic knowledge of Python to study this course. You can check your Python level by looking at the file download.py. If you don't know Python, it is recommended to read the textbook Introduction to Python Programming.

All source code implementation in Github artinte/deep-learning repository, include this website. You can follow my YouTube channel, support my channel with likes and follows — the more support, the faster the updates!

Preface

Learning machine learning is difficult, especially if you are entering from other majors. I saw the trend of artificial intelligence about six years ago and wanted to switch to the artificial intelligence industry, but I didn't have a good entry point.

Since 2014, I have been preparing to learn deep learning systematically. At first, I wrote in Google Docs, then wrote a Machine Learning Series, and now Deep Learning with PyTorch. Along the way, I think the most important thing is the goal and persistence, and you will gradually find the fun of learning.

Deep Learning with PyTorch combines a large number of excellent articles, including published papers, and processes them into an e-book. The blue link is the relevant reference. I would like to thank everyone for their selfless dedication here, and wish you peace and happiness!

My goal is to find a high-paying, easy job, maybe a Youtuber, so that I can have more time to write relevant tutorials and better development space.

01 Tensor and Gradient Basics

It mainly introduces the core concepts in deep learning - tensors and gradients, and lays the foundation for subsequent learning.

1.1 Install PyTorch

pip3 install torch torchvision torchaudio

Select preferences and run the command to install PyTorch locally

1.2 Introduction to Tensors

A torch.Tensor is a multi-dimensional matrix containing elements of a single data type.

Introduction to PyTorch Tensors

Indexing on ndarrays — NumPy v2.2 Manual

Tensor Views - PyTorch 2.7 Documentation

1.3 Data Representation

MNIST Handwritten Digit Database

1.4 Principles of Deep Learning

Artificial Intelligence, Machine Learning, and Deep Learning - Deep Learning with Python

1.5 Calculus

Calculus is designed for the typical two- or three-semester general calculus course, incorporating innovative features to enhance student learning. The book guides students through the core concepts of calculus and helps them understand how those concepts apply to their lives and the world around them. Due to the comprehensive nature of the material, we are offering the book in three volumes for flexibility and efficiency.

Calculus Volume 1 - OpenStax

Calculus Volume 2 - OpenStax

Calculus Volume 3 - OpenStax

1.6 Gradient Descent

Backpropagation, Intuitions

1.7 Neural Network from Scratch

Machine Learning for Beginners: An Introduction to Neural Networks

02 Fully Connected Network

Fully connected neural networks (FCNNs) are a type of artificial neural network where the architecture is such that all the nodes, or neurons, in one layer are connected to the neurons in the next layer.

2.4 Activation Function

The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation function is nonlinear.

A Beginner’s Guide to the Rectified Linear Unit (ReLU)

2.5 Loss Function

What is a loss function?

2.6 Optimizer

Optimizer Implementations

Neural Network Optimizers from Scratch in Python

03 Convolutional Network

A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization.

3.1 CNN from Stratch

CNNs, Part 1: An Introduction to Convolutional Neural Networks

CNNs, Part 2: Training a Convolutional Neural Network

3.2 AlexNet

We trained a large, deep convolutional neural network to classify the 1.3 million high-resolution images in the LSVRC-2010 ImageNet training set into the 1000 different classes.

ImageNet Classification with Deep Convolutional Neural Networks

3.3 ResNet

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.

Deep Residual Learning for Image Recognition

3.4 U-Net

In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization.

U-Net: Convolutional Networks for Biomedical Image Segmentation

3.5 DenseNet

In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections - one between each layer and its subsequent layer - our network has L(L+1)/2 direct connections.

Densely Connected Convolutional Networks

04 Recurrent Network

Recurrent neural networks (RNNs) are a class of artificial neural networks designed for processing sequential data, such as text, speech, and time series, where the order of elements is important.

4.1 RNN from Stratch

An Introduction to Recurrent Neural Networks for Beginners

4.2 Word Embeddings

Words embeddings - TensorFlow

4.3 Word2Vec

Word2Vec - TensorFlow

4.4 LSTM and GRU

Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs

Recurrent Neural Networks Tutorial, Part 2 – Implementing a RNN with Python, Numpy and Theano

Recurrent Neural Networks Tutorial, Part 3 – Backpropagation Through Time and Vanishing Gradients

Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU and LSTM RNN with Python and Theano

Learning to store information over extended time intervals via recurrent backpropagation takes a very long time, mostly due to insufficient, decaying error back flow. We briefly review Hochreiter's 1991 analysis of this problem, then address it by introducing a novel, effcient gradient-based method called "Long Short-Term Memory" (LSTM).

Long Short-term Memory

In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU).

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

4.5 Neural Machine Translation

Neural Machine Translation by Jointly Learning to Align and Translate

4.6 Attention-based NMT

Effective Approaches to Attention-based Neural Machine Translation

05 Transformer

The transformer is a deep learning architecture that was developed by researchers at Google and is based on the multi-head attention mechanism, which was proposed in the 2017 paper Attention Is All You Need.

5.1 Attention Mechanism

What is an attention mechanism?

Attention Mechanisms and Transformers

5.2 Attention Is All You Need

We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.

Attention Is All You Need

5.3 nn.Transformer

Attention Is All You Need

Attention Mechanisms and Transformers

5.4 Transformer from Stratch

The Annotated Transformer

5.5 nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs

5.6 BERT

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

5.7 Vision Transformer

In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

06 Diffusion Model

6.1 Probability Theory

The text focuses on diverse applications from a variety of fields and societal contexts, including business, healthcare, sciences, sociology, political science, computing, and several others.

Introductory Statistics 2e - OpenStax

Standard Deviation and Variance

6.2 Gaussian Processes

Gaussian Processes - Dive into Deep Learning

6.3 Mathematical Foundation

Mathematical Foundation of Diffusion Generative Models

6.4 Diffusion from Scratch

Understanding Stable Diffusion from "Scratch"

6.5 Estimating Gradients

We introduce a new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching.

Generative Modeling by Estimating Gradients of the Data Distribution

6.6 Diffusion Probability Model

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics.

Denoising Diffusion Probabilistic Models

6.7 Latent Diffusion

To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders.

High-Resolution Image Synthesis with Latent Diffusion Models

07 Text

7.1 Translate text with Transformer

Neural machine translation with a Transformer and Keras

7.2 Easy OCR

EasyOCR: Ready-to-use OCR with 80+ supported languages

7.3 Language Modeling

Industry Leading, Open-Source AI | Llama by Meta

7.4 Chatbots

08 Audio

8.1 Speech Feature Extraction

WAVE PCM soundfile format

torchaudio.transforms.MelSpectrogram

8.2 Automatic Speech Recognition

whisper: About Robust Speech Recognition via Large-Scale Weak Supervision

Introducing Whisper

Robust Speech Recognition via Large-Scale Weak Supervision

8.3 Text-to-Speech

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

8.4 Speech Separation

8.5 Voice Synthesis

09 Image and Video

9.1 Object Detection

TorchVision Object Detection Finetuning Tutorial

9.2 Transfer Learning

Transfer Learning for Computer Vision Tutorial

9.3 FGSM Attack

Adversarial Example Generation

9.4 Spatial Transformer

Spatial Transformer Networks Tutorial

9.5 DeepFaceLab

9.6 DeepFaceLive

9.7 Segment Anything

segment-anything: provides code for running inference with the SegmentAnything Model (SAM)

9.8 Intro to Autoencoders

Intro to Autoencoders

10 Reinforcement Learning

Implementation of Reinforcement Learning Algorithms

10.1 Introduction RL Problems

Reinforcement Learning: An Introduction (2nd Edition)

10.2 Dynamic Programming

10.3 DQN

This tutorial shows how to use PyTorch to train a Deep Q Learning (DQN) agent on the CartPole-v1 task from Gymnasium.

Reinforcement Learning (DQN) Tutorial

10.4 PPO

This tutorial demonstrates how to use PyTorch and torchrl to train a parametric policy network to solve the Inverted Pendulum task from the OpenAI-Gym/Farama-Gymnasium control library.

Reinforcement Learning (PPO) with TorchRL Tutorial

10.5 Function Approximation

11 Extending PyTorch

11.1 Custom Operators

11.2 Custom Functions

11.3 C++ and CUDA Extensions

11.4 Extending TorchScript

11.5 Dispatcher

12 Deploying Models

12.1 ONNX

ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.

12.2 TorchScript

12.3 ExecuTorch

Getting Started with ExecuTorch

12.4 TensorFlow Lite

12.5 TensorFlow.js

TensorFlow.js — Handwritten digit recognition with CNNs

13 Model Optimization

13.1 LoRA

LoRA: Low-Rank Adaptation of Large Language Models

13.2 Pruning

Pruning Tutorial

13.3 Quantization

We’ll lay a (quick) foundation of quantization in deep learning, and then take a look at how each technique looks like in practice.

Practical Quantization in PyTorch

13.4 Distillation

Knowledge distillation is a technique that enables knowledge transfer from large, computationally expensive models to smaller ones without losing validity. This allows for deployment on less powerful hardware, making evaluation faster and more efficient.

Knowledge Distillation Tutorial

Distilling the Knowledge in a Neural Network

FitNets: Hints for Thin Deep Nets