- Published on
PyTorch for Python
What is PyTorch?
- Open-Source Machine Learning Framework: PyTorch is a popular open-source library primarily used for deep learning applications but also offers versatility in general machine learning areas.
- Based on Torch: It's a Python adaptation of the Lua-based Torch scientific computing framework.
- Core Applications:
- Computer Vision (image/video processing)
- Natural Language Processing (text analysis, translation)
- Reinforcement Learning
- Building and training deep neural networks
Key Features
- Tensor Computations with GPU Acceleration: PyTorch heavily utilizes tensors (multidimensional arrays) for efficient numerical computations and can leverage the power of GPUs for faster processing.
- Dynamic Computational Graphs: In contrast to frameworks like TensorFlow (prior to v2), PyTorch builds computational graphs on the fly. This enhances flexibility for debugging and experimentation during model development.
- Pythonic and User-Friendly: PyTorch seamlessly integrates with the Python ecosystem and feels natural to Python developers, making it easy to learn and adopt.
- Extensive Community and Ecosystem: PyTorch boasts a large, active community and a wide array of pre-trained models, tutorials, and tools.
Why is PyTorch Popular?
- Researcher-Friendly: Its dynamic nature and focus on flexibility make it highly favored in research environments where rapid prototyping and experimentation are crucial.
- Production-Ready: While popular in research, PyTorch is equally capable for production-level deployments in applications like self-driving cars and natural language processing systems.
- Strong Competition to TensorFlow: It's one of the primary competitors to Google's TensorFlow framework, with both offering distinct advantages.
Getting Started with PyTorch
- Installation:
pip install torch torchvision
(Often you'll install torchvision for computer vision tools) - Basics: Learn about tensors, neural network modules, automatic differentiation, and model training loops.
- Resources:
- Official website: https://pytorch.org/
- Tutorials and documentation: https://pytorch.org/tutorials/
Installation Guide
PyTorch offers flexible installation options based on your hardware and requirements:
CPU-Only Installation
For CPU-only environments (testing, development without GPU):
pip install torch torchvision torchaudio
GPU Installation with CUDA
For NVIDIA GPUs, install CUDA-enabled PyTorch. Check your CUDA version first:
# CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
AMD GPU with ROCm
For AMD GPUs using ROCm:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6
Conda Installation
Using conda for environment management:
# CPU version
conda install pytorch torchvision torchaudio cpuonly -c pytorch
# CUDA version
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
Verify Installation
import torch
# Check PyTorch version
print(f"PyTorch version: {torch.__version__}")
# Check CUDA availability
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"Number of GPUs: {torch.cuda.device_count()}")
Core Concepts Deep Dive
Tensors
Tensors are the fundamental building blocks in PyTorch - multidimensional arrays similar to NumPy arrays but with GPU acceleration capabilities.
import torch
# Creating tensors
tensor_1d = torch.tensor([1, 2, 3, 4])
tensor_2d = torch.tensor([[1, 2], [3, 4]])
tensor_zeros = torch.zeros(3, 3)
tensor_ones = torch.ones(2, 4)
tensor_random = torch.randn(3, 3) # Normal distribution
# Tensor from NumPy
import numpy as np
numpy_array = np.array([1, 2, 3])
tensor_from_numpy = torch.from_numpy(numpy_array)
# Tensor properties
print(f"Shape: {tensor_2d.shape}")
print(f"Data type: {tensor_2d.dtype}")
print(f"Device: {tensor_2d.device}")
# Tensor operations
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])
# Element-wise operations
addition = a + b
multiplication = a * b
dot_product = torch.dot(a, b)
# Matrix operations
matrix_a = torch.randn(3, 4)
matrix_b = torch.randn(4, 5)
matrix_mult = torch.matmul(matrix_a, matrix_b)
Automatic Differentiation (Autograd)
PyTorch's autograd system automatically computes gradients for backpropagation:
# Enable gradient tracking
x = torch.tensor([2.0, 3.0], requires_grad=True)
y = torch.tensor([6.0, 4.0], requires_grad=True)
# Define computation
z = x * y
loss = z.sum()
# Compute gradients automatically
loss.backward()
# Access gradients
print(f"Gradient of x: {x.grad}") # dL/dx
print(f"Gradient of y: {y.grad}") # dL/dy
# Gradient accumulation
x.grad.zero_() # Clear gradients before next iteration
Computational Graphs
PyTorch builds dynamic computational graphs that record operations:
import torch
x = torch.tensor(3.0, requires_grad=True)
y = torch.tensor(4.0, requires_grad=True)
# Operations create computational graph
z = x ** 2 + y ** 3
z.backward()
print(f"dz/dx = {x.grad}") # 2*x = 6.0
print(f"dz/dy = {y.grad}") # 3*y^2 = 48.0
Building Neural Networks
PyTorch provides nn.Module
as the base class for all neural networks:
Basic Neural Network Architecture
import torch
import torch.nn as nn
import torch.nn.functional as F
class SimpleNet(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleNet, self).__init__()
# Define layers
self.fc1 = nn.Linear(input_size, hidden_size)
self.fc2 = nn.Linear(hidden_size, hidden_size)
self.fc3 = nn.Linear(hidden_size, output_size)
self.dropout = nn.Dropout(0.2)
def forward(self, x):
# Define forward pass
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = F.relu(self.fc2(x))
x = self.dropout(x)
x = self.fc3(x)
return x
# Instantiate model
model = SimpleNet(input_size=784, hidden_size=128, output_size=10)
print(model)
# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params:,}")
Convolutional Neural Network (CNN)
class CNN(nn.Module):
def __init__(self, num_classes=10):
super(CNN, self).__init__()
# Convolutional layers
self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
# Fully connected layers
self.fc1 = nn.Linear(64 * 7 * 7, 128)
self.fc2 = nn.Linear(128, num_classes)
def forward(self, x):
# Conv block 1
x = self.pool(F.relu(self.conv1(x)))
# Conv block 2
x = self.pool(F.relu(self.conv2(x)))
# Flatten
x = x.view(-1, 64 * 7 * 7)
# Fully connected
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
cnn_model = CNN(num_classes=10)
Common Activation Functions
# ReLU (most common)
relu = nn.ReLU()
# LeakyReLU
leaky_relu = nn.LeakyReLU(negative_slope=0.01)
# Sigmoid
sigmoid = nn.Sigmoid()
# Tanh
tanh = nn.Tanh()
# Softmax (for classification output)
softmax = nn.Softmax(dim=1)
# GELU (used in transformers)
gelu = nn.GELU()
Training Workflow
Complete training pipeline with data loading, loss computation, and optimization:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
# 1. Prepare data
X_train = torch.randn(1000, 20) # 1000 samples, 20 features
y_train = torch.randint(0, 2, (1000,)) # Binary classification
train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
# 2. Initialize model
model = SimpleNet(input_size=20, hidden_size=64, output_size=2)
# 3. Define loss function
criterion = nn.CrossEntropyLoss()
# 4. Define optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)
# 5. Training loop
num_epochs = 10
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
for epoch in range(num_epochs):
model.train() # Set to training mode
running_loss = 0.0
correct = 0
total = 0
for batch_idx, (data, target) in enumerate(train_loader):
# Move to device
data, target = data.to(device), target.to(device)
# Zero gradients
optimizer.zero_grad()
# Forward pass
outputs = model(data)
loss = criterion(outputs, target)
# Backward pass
loss.backward()
# Update weights
optimizer.step()
# Statistics
running_loss += loss.item()
_, predicted = outputs.max(1)
total += target.size(0)
correct += predicted.eq(target).sum().item()
# Epoch statistics
epoch_loss = running_loss / len(train_loader)
accuracy = 100. * correct / total
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {epoch_loss:.4f}, Accuracy: {accuracy:.2f}%')
# 6. Evaluation mode
model.eval()
with torch.no_grad():
# Evaluation code here
pass
Learning Rate Scheduling
# Learning rate scheduler
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)
# In training loop
for epoch in range(num_epochs):
# ... training code ...
scheduler.step() # Update learning rate
PyTorch vs TensorFlow
Feature | PyTorch | TensorFlow |
---|---|---|
Computational Graph | Dynamic (define-by-run) | Static (TF 1.x), Dynamic (TF 2.x with Eager) |
Debugging | Easier with Python debugger | More complex in TF 1.x, improved in 2.x |
Production Deployment | TorchServe, ONNX | TensorFlow Serving, TFLite, TF.js |
Learning Curve | More Pythonic, easier for beginners | Steeper initially, improved with TF 2.x |
Community | Strong in research, academia | Strong in industry, production |
Mobile Deployment | PyTorch Mobile | TensorFlow Lite (more mature) |
Visualization | TensorBoard (via integration) | TensorBoard (native) |
API Design | More flexible, explicit control | More high-level options (Keras) |
Performance | Excellent for research workflows | Optimized for production scale |
Ecosystem | Growing rapidly | More mature, extensive |
When to Use PyTorch
- Research and experimentation
- Rapid prototyping
- When you need dynamic computational graphs
- Academic projects and papers
- Computer vision and NLP research
When to Use TensorFlow
- Production deployment at scale
- Mobile and edge device deployment
- When you need TensorFlow Extended (TFX) for MLOps
- JavaScript/web deployment (TF.js)
- Established production pipelines
GPU Acceleration
Moving Tensors to GPU
# Check GPU availability
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
# Move tensors to GPU
tensor_cpu = torch.randn(1000, 1000)
tensor_gpu = tensor_cpu.to(device)
# Alternative methods
tensor_gpu = tensor_cpu.cuda() # Explicit CUDA
tensor_gpu = torch.randn(1000, 1000, device='cuda') # Create directly on GPU
# Move back to CPU
tensor_cpu = tensor_gpu.cpu()
# Check tensor device
print(f"Tensor is on: {tensor_gpu.device}")
GPU Memory Management
# Clear GPU cache
torch.cuda.empty_cache()
# Get memory usage
print(f"Allocated: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
print(f"Reserved: {torch.cuda.memory_reserved() / 1024**3:.2f} GB")
# Set specific GPU
torch.cuda.set_device(0) # Use GPU 0
# Context manager for specific GPU
with torch.cuda.device(1):
# Operations on GPU 1
pass
Multi-GPU Training
# DataParallel (simple but older method)
if torch.cuda.device_count() > 1:
print(f"Using {torch.cuda.device_count()} GPUs")
model = nn.DataParallel(model)
model.to(device)
# DistributedDataParallel (recommended for multi-GPU)
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP
# Initialize process group
dist.init_process_group(backend='nccl')
# Wrap model
model = DDP(model, device_ids=[local_rank])
Common Use Cases
Computer Vision - Image Classification
import torchvision
import torchvision.transforms as transforms
# Data preprocessing
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
# Load pretrained model
model = torchvision.models.resnet50(pretrained=True)
# Modify for custom number of classes
num_classes = 10
model.fc = nn.Linear(model.fc.in_features, num_classes)
# Fine-tuning: freeze early layers
for param in model.parameters():
param.requires_grad = False
# Unfreeze final layer
for param in model.fc.parameters():
param.requires_grad = True
Natural Language Processing - Text Classification
class TextClassifier(nn.Module):
def __init__(self, vocab_size, embed_dim, num_classes):
super(TextClassifier, self).__init__()
self.embedding = nn.Embedding(vocab_size, embed_dim)
self.lstm = nn.LSTM(embed_dim, 128, batch_first=True, bidirectional=True)
self.fc = nn.Linear(256, num_classes)
self.dropout = nn.Dropout(0.3)
def forward(self, x):
# x shape: (batch_size, sequence_length)
embedded = self.embedding(x)
lstm_out, (hidden, cell) = self.lstm(embedded)
# Use final hidden state
hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)
output = self.dropout(hidden)
output = self.fc(output)
return output
# Initialize model
vocab_size = 10000
model = TextClassifier(vocab_size=vocab_size, embed_dim=100, num_classes=5)
Reinforcement Learning - Simple Agent
class DQN(nn.Module):
def __init__(self, state_size, action_size):
super(DQN, self).__init__()
self.fc1 = nn.Linear(state_size, 128)
self.fc2 = nn.Linear(128, 128)
self.fc3 = nn.Linear(128, action_size)
def forward(self, state):
x = F.relu(self.fc1(state))
x = F.relu(self.fc2(x))
q_values = self.fc3(x)
return q_values
# Agent training step
state = torch.FloatTensor(current_state)
q_values = model(state)
action = q_values.argmax().item()
Best Practices
Model Saving and Loading
# Save entire model
torch.save(model, 'model_complete.pth')
# Load entire model
model = torch.load('model_complete.pth')
# Save model state dict (recommended)
torch.save(model.state_dict(), 'model_weights.pth')
# Load model state dict
model = SimpleNet(input_size=20, hidden_size=64, output_size=2)
model.load_state_dict(torch.load('model_weights.pth'))
# Save training checkpoint
checkpoint = {
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss,
}
torch.save(checkpoint, 'checkpoint.pth')
# Load checkpoint
checkpoint = torch.load('checkpoint.pth')
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']
Debugging Techniques
# Check for NaN values
assert not torch.isnan(loss).any(), "Loss contains NaN"
# Gradient clipping (prevent exploding gradients)
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
# Register hooks for debugging
def print_grad(grad):
print(f"Gradient: {grad}")
x = torch.tensor([1.0], requires_grad=True)
x.register_hook(print_grad)
# Anomaly detection
torch.autograd.set_detect_anomaly(True)
# Check model architecture
from torchsummary import summary
summary(model, input_size=(1, 28, 28))
Profiling Performance
import torch.profiler as profiler
with profiler.profile(
activities=[profiler.ProfilerActivity.CPU, profiler.ProfilerActivity.CUDA],
record_shapes=True,
profile_memory=True,
with_stack=True
) as prof:
# Your training code here
model(input_tensor)
# Print results
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))
# Export for Chrome trace viewer
prof.export_chrome_trace("trace.json")
PyTorch Ecosystem
TorchVision - Computer Vision
import torchvision
# Pretrained models
resnet = torchvision.models.resnet50(pretrained=True)
vgg = torchvision.models.vgg16(pretrained=True)
efficientnet = torchvision.models.efficientnet_b0(pretrained=True)
# Datasets
train_dataset = torchvision.datasets.CIFAR10(
root='./data',
train=True,
download=True,
transform=transform
)
# Transforms
transforms = torchvision.transforms.Compose([
torchvision.transforms.RandomHorizontalFlip(),
torchvision.transforms.RandomRotation(10),
torchvision.transforms.ToTensor(),
])
TorchText - Natural Language Processing
# Note: torchtext has undergone significant API changes
from torchtext.vocab import build_vocab_from_iterator
from torchtext.data.utils import get_tokenizer
tokenizer = get_tokenizer('basic_english')
# Build vocabulary
def yield_tokens(data_iter):
for text in data_iter:
yield tokenizer(text)
vocab = build_vocab_from_iterator(yield_tokens(texts), specials=["<unk>", "<pad>"])
vocab.set_default_index(vocab["<unk>"])
TorchAudio - Audio Processing
import torchaudio
# Load audio
waveform, sample_rate = torchaudio.load("audio.wav")
# Transforms
spectrogram = torchaudio.transforms.Spectrogram()
mel_spectrogram = torchaudio.transforms.MelSpectrogram(sample_rate=16000)
# Apply transform
spec = mel_spectrogram(waveform)
PyTorch Lightning - Simplified Training
import pytorch_lightning as pl
class LitModel(pl.LightningModule):
def __init__(self):
super().__init__()
self.model = SimpleNet(20, 64, 2)
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.model(x)
loss = F.cross_entropy(y_hat, y)
self.log('train_loss', loss)
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.001)
# Train with Lightning
trainer = pl.Trainer(max_epochs=10, accelerator='gpu')
trainer.fit(model, train_loader)
Performance Optimization
JIT Compilation with TorchScript
# Trace-based scripting
model = SimpleNet(20, 64, 2)
model.eval()
example_input = torch.randn(1, 20)
traced_model = torch.jit.trace(model, example_input)
# Save traced model
traced_model.save("model_traced.pt")
# Script-based compilation (supports control flow)
scripted_model = torch.jit.script(model)
scripted_model.save("model_scripted.pt")
# Load and use
loaded_model = torch.jit.load("model_traced.pt")
output = loaded_model(example_input)
Mixed Precision Training
from torch.cuda.amp import autocast, GradScaler
# Initialize gradient scaler
scaler = GradScaler()
for epoch in range(num_epochs):
for data, target in train_loader:
optimizer.zero_grad()
# Automatic mixed precision
with autocast():
output = model(data)
loss = criterion(output, target)
# Scaled backward pass
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
DataLoader Optimization
# Optimize data loading
train_loader = DataLoader(
dataset,
batch_size=64,
shuffle=True,
num_workers=4, # Parallel data loading
pin_memory=True, # Faster GPU transfer
persistent_workers=True, # Keep workers alive
prefetch_factor=2 # Prefetch batches
)
# Custom collate function for variable-length sequences
def collate_fn(batch):
sequences, labels = zip(*batch)
# Pad sequences
padded_sequences = torch.nn.utils.rnn.pad_sequence(sequences, batch_first=True)
return padded_sequences, torch.tensor(labels)
loader = DataLoader(dataset, batch_size=32, collate_fn=collate_fn)
Model Optimization Techniques
# Gradient accumulation for larger effective batch size
accumulation_steps = 4
for i, (data, target) in enumerate(train_loader):
output = model(data)
loss = criterion(output, target)
loss = loss / accumulation_steps
loss.backward()
if (i + 1) % accumulation_steps == 0:
optimizer.step()
optimizer.zero_grad()
# Gradient checkpointing for memory efficiency
from torch.utils.checkpoint import checkpoint
class CheckpointedModel(nn.Module):
def forward(self, x):
# Trade compute for memory
x = checkpoint(self.layer1, x)
x = checkpoint(self.layer2, x)
return x
Troubleshooting
Common Errors and Solutions
CUDA Out of Memory
# Solution 1: Reduce batch size
batch_size = 16 # Instead of 64
# Solution 2: Gradient accumulation
# (shown in optimization section)
# Solution 3: Clear cache
torch.cuda.empty_cache()
# Solution 4: Use gradient checkpointing
# (shown in optimization section)
RuntimeError: Expected all tensors on same device
# Problem: Tensors on different devices
# Solution: Ensure all tensors are on same device
model = model.to(device)
data = data.to(device)
target = target.to(device)
Gradient becomes NaN
# Solution 1: Gradient clipping
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
# Solution 2: Lower learning rate
optimizer = optim.Adam(model.parameters(), lr=0.0001)
# Solution 3: Check for invalid operations
assert not torch.isnan(loss).any()
DataLoader Worker Crashes
# Solution: Reduce num_workers or set to 0
train_loader = DataLoader(dataset, batch_size=32, num_workers=0)
# On Windows, use proper main guard
if __name__ == '__main__':
# DataLoader code here
pass
Model Not Learning
# Check 1: Verify gradients are flowing
for name, param in model.named_parameters():
if param.grad is not None:
print(f"{name}: {param.grad.abs().mean()}")
# Check 2: Ensure model is in training mode
model.train()
# Check 3: Verify learning rate
print(f"Learning rate: {optimizer.param_groups[0]['lr']}")
# Check 4: Check data and labels
print(f"Data range: {data.min()} to {data.max()}")
print(f"Unique labels: {target.unique()}")
Slow Training Speed
# Profiling to find bottlenecks
import time
start = time.time()
for i, (data, target) in enumerate(train_loader):
if i == 0:
print(f"First batch loading time: {time.time() - start:.2f}s")
# Training code
pass
# Solutions:
# - Increase num_workers in DataLoader
# - Use pin_memory=True
# - Move data preprocessing to GPU if possible
# - Use mixed precision training
Related Topics
- Top 10 Python Libraries for Data Engineering - Discover other powerful Python libraries
- Python NLP Libraries - Specialized libraries for natural language processing with PyTorch
- Natural Language Processing (NLP) - Understanding NLP fundamentals
Related Articles
Python Cheat Sheet
Awesome Python frameworks. A curated list of awesome Python frameworks, libraries, software and resources.
Awesome Python frameworks
Awesome Python frameworks. A curated list of awesome Python frameworks, libraries, software and resources.
The power of Pipenv and Jupyter Notebook
A comprehensive guide to using Pipenv with Jupyter Notebook and VSCode. Learn how to manage Python dependencies, create isolated environments, and configure your development workflow for data science projects.