Deep Learning With PyTorch
This eBook not only focuses on the explanation of theoretical knowledge, but also pays more attention to engineering practice. By combining a large number of practical cases, especially how to train, optimize and deploy models, readers will be able to master how to use PyTorch to complete various deep learning tasks.

You need to have basic knowledge of Python to study this course. You can check your Python level by looking at the file download.py. If you don't know Python, it is recommended to read the textbook Introduction to Python Programming.

All source code implementation in Github artinte/deep-learning repository, include this website. You can follow my YouTube channel, support my channel with likes and follows — the more support, the faster the updates!
Preface
Learning machine learning is difficult, especially if you are entering from other majors. I saw the trend of artificial intelligence about six years ago and wanted to switch to the artificial intelligence industry, but I didn't have a good entry point.
Since 2014, I have been preparing to learn deep learning systematically. At first, I wrote in Google Docs, then wrote a Machine Learning Series, and now Deep Learning with PyTorch. Along the way, I think the most important thing is the goal and persistence, and you will gradually find the fun of learning.
Deep Learning with PyTorch combines a large number of excellent articles, including published papers, and processes them into an e-book. The blue link is the relevant reference. I would like to thank everyone for their selfless dedication here, and wish you peace and happiness!
My goal is to find a high-paying, easy job, maybe a Youtuber, so that I can have more time to write relevant tutorials and better development space.
01 Tensor and Gradient Basics
It mainly introduces the core concepts in deep learning - tensors and gradients, and lays the foundation for subsequent learning.

1.1 Install PyTorch
pip3 install torch torchvision torchaudio
Select preferences and run the command to install PyTorch locally
1.2 Introduction to Tensors
A torch.Tensor is a multi-dimensional matrix containing elements of a single data type.
Introduction to PyTorch Tensors
Indexing on ndarrays — NumPy v2.2 Manual
Tensor Views - PyTorch 2.7 Documentation
1.3 Data Representation
MNIST Handwritten Digit Database
1.4 Principles of Deep Learning
Artificial Intelligence, Machine Learning, and Deep Learning - Deep Learning with Python
1.5 Calculus
Calculus is designed for the typical two- or three-semester general calculus course, incorporating innovative features to enhance student learning. The book guides students through the core concepts of calculus and helps them understand how those concepts apply to their lives and the world around them. Due to the comprehensive nature of the material, we are offering the book in three volumes for flexibility and efficiency.
1.6 Gradient Descent
1.7 Neural Network from Scratch
Machine Learning for Beginners: An Introduction to Neural Networks
02 Fully Connected Network
Fully connected neural networks (FCNNs) are a type of artificial neural network where the architecture is such that all the nodes, or neurons, in one layer are connected to the neurons in the next layer.

2.1 Linear Algebra
Introduction to Linear Algebra, Sixth Edition
2.2 Points Classification
Implementing a Neural Network from Scratch in Python
2.3 PyTorch Basics
2.4 Activation Function
The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation function is nonlinear.
A Beginner’s Guide to the Rectified Linear Unit (ReLU)
2.5 Loss Function
2.6 Optimizer
03 Convolutional Network
A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization.
3.1 CNN from Stratch
CNNs, Part 1: An Introduction to Convolutional Neural Networks
CNNs, Part 2: Training a Convolutional Neural Network
3.2 AlexNet
We trained a large, deep convolutional neural network to classify the 1.3 million high-resolution images in the LSVRC-2010 ImageNet training set into the 1000 different classes.
ImageNet Classification with Deep Convolutional Neural Networks
3.3 ResNet
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Deep Residual Learning for Image Recognition
3.4 U-Net
In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization.
U-Net: Convolutional Networks for Biomedical Image Segmentation
3.5 DenseNet
In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections - one between each layer and its subsequent layer - our network has L(L+1)/2 direct connections.
04 Recurrent Network
Recurrent neural networks (RNNs) are a class of artificial neural networks designed for processing sequential data, such as text, speech, and time series, where the order of elements is important.

4.1 RNN from Stratch
An Introduction to Recurrent Neural Networks for Beginners
4.2 Word Embeddings
4.3 Word2Vec
4.4 LSTM and GRU
Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs
Recurrent Neural Networks Tutorial, Part 2 – Implementing a RNN with Python, Numpy and Theano
Recurrent Neural Networks Tutorial, Part 3 – Backpropagation Through Time and Vanishing Gradients
Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU and LSTM RNN with Python and Theano
Learning to store information over extended time intervals via recurrent backpropagation takes a very long time, mostly due to insufficient, decaying error back flow. We briefly review Hochreiter's 1991 analysis of this problem, then address it by introducing a novel, effcient gradient-based method called "Long Short-Term Memory" (LSTM).
In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU).
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
4.5 Neural Machine Translation
Neural Machine Translation by Jointly Learning to Align and Translate
4.6 Attention-based NMT
Effective Approaches to Attention-based Neural Machine Translation
05 Transformer
The transformer is a deep learning architecture that was developed by researchers at Google and is based on the multi-head attention mechanism, which was proposed in the 2017 paper Attention Is All You Need.

5.1 Attention Mechanism
What is an attention mechanism?
Attention Mechanisms and Transformers
5.2 Attention Is All You Need
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
5.3 nn.Transformer
Attention Mechanisms and Transformers
5.4 Transformer from Stratch
5.5 nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs
5.6 BERT
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
5.7 Vision Transformer
In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
06 Diffusion Model
6.1 Probability Theory
The text focuses on diverse applications from a variety of fields and societal contexts, including business, healthcare, sciences, sociology, political science, computing, and several others.
Introductory Statistics 2e - OpenStax
Standard Deviation and Variance
6.2 Gaussian Processes
Gaussian Processes - Dive into Deep Learning
6.3 Mathematical Foundation
Mathematical Foundation of Diffusion Generative Models
6.4 Diffusion from Scratch
Understanding Stable Diffusion from "Scratch"
6.5 Estimating Gradients
We introduce a new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching.
Generative Modeling by Estimating Gradients of the Data Distribution
6.6 Diffusion Probability Model
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics.
Denoising Diffusion Probabilistic Models
6.7 Latent Diffusion
To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders.
High-Resolution Image Synthesis with Latent Diffusion Models
07 Text
7.1 Translate text with Transformer
Neural machine translation with a Transformer and Keras
7.2 Easy OCR
EasyOCR: Ready-to-use OCR with 80+ supported languages
7.3 Language Modeling
Industry Leading, Open-Source AI | Llama by Meta
7.4 Chatbots
08 Audio
8.1 Speech Feature Extraction
torchaudio.transforms.MelSpectrogram
8.2 Automatic Speech Recognition
whisper: About Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision
8.3 Text-to-Speech
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
8.4 Speech Separation
8.5 Voice Synthesis
09 Image and Video
9.1 Object Detection
TorchVision Object Detection Finetuning Tutorial
9.2 Transfer Learning
Transfer Learning for Computer Vision Tutorial
9.3 FGSM Attack
Adversarial Example Generation
9.4 Spatial Transformer
Spatial Transformer Networks Tutorial
9.5 DeepFaceLab
9.6 DeepFaceLive
9.7 Segment Anything
segment-anything: provides code for running inference with the SegmentAnything Model (SAM)
9.8 Intro to Autoencoders
10 Reinforcement Learning
Implementation of Reinforcement Learning Algorithms
10.1 Introduction RL Problems
Reinforcement Learning: An Introduction (2nd Edition)
10.2 Dynamic Programming
10.3 DQN
This tutorial shows how to use PyTorch to train a Deep Q Learning (DQN) agent on the CartPole-v1 task from Gymnasium.
Reinforcement Learning (DQN) Tutorial
10.4 PPO
This tutorial demonstrates how to use PyTorch and torchrl to train a parametric policy network to solve the Inverted Pendulum task from the OpenAI-Gym/Farama-Gymnasium control library.
Reinforcement Learning (PPO) with TorchRL Tutorial
10.5 Function Approximation
11 Extending PyTorch
11.1 Custom Operators
11.2 Custom Functions
11.3 C++ and CUDA Extensions
11.4 Extending TorchScript
11.5 Dispatcher
12 Deploying Models
12.1 ONNX
ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.
12.2 TorchScript
12.3 ExecuTorch
Getting Started with ExecuTorch
12.4 TensorFlow Lite
12.5 TensorFlow.js
13 Model Optimization
13.1 LoRA
LoRA: Low-Rank Adaptation of Large Language Models
13.2 Pruning
13.3 Quantization
We’ll lay a (quick) foundation of quantization in deep learning, and then take a look at how each technique looks like in practice.
Practical Quantization in PyTorch
13.4 Distillation
Knowledge distillation is a technique that enables knowledge transfer from large, computationally expensive models to smaller ones without losing validity. This allows for deployment on less powerful hardware, making evaluation faster and more efficient.
Knowledge Distillation Tutorial
14 Distributed Training
14.1 Distributed Overview
Distributed and Parallel Training Tutorials
14.2 Distributed Data Parallel
Distributed Data Parallel in PyTorch - Video Tutorials
14.3 Fully Sharded Data Parallel
Getting Started with Fully Sharded Data Parallel (FSDP2)
14.4 Tenser Parallel
Tensor Parallelism - torch.distributed.tensor.parallel
14.5 Device Mesh
Getting Started with DeviceMesh
14.6 Remote Procedure Call
Combining Distributed DataParallel with Distributed RPC Framework