A homepage subtitle here And an awesome description here!

07 April 2025

Transfer Learning with PyTorch: Classifying Ants and Bees Using ResNet18

Transfer Learning

Transfer learning is a powerful technique in deep learning that allows us to leverage pre-trained models to solve new tasks with limited data. In this blog post, we’ll walk through a practical example of transfer learning using PyTorch. We’ll fine-tune a pre-trained ResNet18 model to classify images of ants and bees from the Hymenoptera dataset, downloaded from Kaggle. By the end, you’ll understand how to set up the dataset, apply data transformations, train the model, and visualize predictions—all with a few lines of code!

What is Transfer Learning?

Transfer learning involves taking a model trained on a large, general dataset (like ImageNet) and adapting it to a specific task. Instead of training a neural network from scratch, which requires massive data and computing resources, we start with a pre-trained model and tweak it for our needs. This approach is especially useful when working with small datasets, as it reduces training time and the need for extensive labeled data.

In this example, we’ll use ResNet18, pre-trained on ImageNet, and fine-tune it to distinguish between ants and bees—a binary classification task.

Code:

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.backends.cudnn as cudnn
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
from PIL import Image
from tempfile import TemporaryDirectory

# Kaggle API setup
os.environ["KAGGLE_USERNAME"] = "tspradeepkumar"  # Replace with your Kaggle username
os.environ["KAGGLE_KEY"] = "740d0138672a033f5e2020390c3cb021"  # Replace with your Kaggle API key

from kaggle.api.kaggle_api_extended import KaggleApi

# Enable CuDNN benchmarking for performance
cudnn.benchmark = True
plt.ion()  # Enable interactive mode for matplotlib

# Download Hymenoptera dataset from Kaggle if not already present
data_dir = './hymenoptera'
if not os.path.exists(os.path.join(data_dir, 'train')):
    print("Downloading Hymenoptera dataset from Kaggle...")
    api = KaggleApi()
    api.authenticate()
    api.dataset_download_files('thedatasith/hymenoptera', path=data_dir, unzip=True)
    print("Dataset downloaded and extracted.")
else:
    print("Hymenoptera dataset already exists.")

# Data transformations for training and validation
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ]),
}

# Load datasets and create dataloaders
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                              shuffle=True, num_workers=4)
               for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Helper to show images
def imshow(inp, title=None):
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)

# Show a batch of training data
inputs, classes = next(iter(dataloaders['train']))
out = torchvision.utils.make_grid(inputs)
imshow(out, title=[class_names[x] for x in classes])

# Training function
def train_model(model, criterion, optimizer, scheduler, num_epochs=5):
    since = time.time()
    with TemporaryDirectory() as tempdir:
        best_model_path = os.path.join(tempdir, 'best_model.pt')
        torch.save(model.state_dict(), best_model_path)
        best_acc = 0.0

        for epoch in range(num_epochs):
            print(f'Epoch {epoch}/{num_epochs - 1}')
            print('-' * 10)
            for phase in ['train', 'val']:
                model.train() if phase == 'train' else model.eval()
                running_loss = 0.0
                running_corrects = 0

                for inputs, labels in dataloaders[phase]:
                    inputs, labels = inputs.to(device), labels.to(device)
                    optimizer.zero_grad()

                    with torch.set_grad_enabled(phase == 'train'):
                        outputs = model(inputs)
                        _, preds = torch.max(outputs, 1)
                        loss = criterion(outputs, labels)

                        if phase == 'train':
                            loss.backward()
                            optimizer.step()

                    running_loss += loss.item() * inputs.size(0)
                    running_corrects += torch.sum(preds == labels.data)

                if phase == 'train':
                    scheduler.step()

                epoch_loss = running_loss / dataset_sizes[phase]
                epoch_acc = running_corrects.double() / dataset_sizes[phase]
                print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

                if phase == 'val' and epoch_acc > best_acc:
                    best_acc = epoch_acc
                    torch.save(model.state_dict(), best_model_path)
            print()

        time_elapsed = time.time() - since
        print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
        print(f'Best val Acc: {best_acc:.4f}')
        model.load_state_dict(torch.load(best_model_path))
    return model

# Visualization of model predictions
def visualize_model(model, num_images=6):
    was_training = model.training
    model.eval()
    images_so_far = 0
    fig = plt.figure()

    with torch.no_grad():
        for inputs, labels in dataloaders['val']:
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)

            for j in range(inputs.size()[0]):
                images_so_far += 1
                ax = plt.subplot(num_images // 2, 2, images_so_far)
                ax.axis('off')
                ax.set_title(f'Predicted: {class_names[preds[j]]}')
                imshow(inputs.cpu().data[j])

                if images_so_far == num_images:
                    model.train(mode=was_training)
                    return
        model.train(mode=was_training)

# Predict a single image
def visualize_model_predictions(model, img_path):
    model.eval()
    img = Image.open(img_path)
    img = data_transforms['val'](img)
    img = img.unsqueeze(0).to(device)

    with torch.no_grad():
        outputs = model(img)
        _, preds = torch.max(outputs, 1)
        plt.figure()
        plt.title(f'Predicted: {class_names[preds[0]]}')
        imshow(img.cpu().data[0])

# Load pretrained ResNet18 and modify for binary classification
model_conv = models.resnet18(weights='IMAGENET1K_V1')
for param in model_conv.parameters():
    param.requires_grad = False
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)
model_conv = model_conv.to(device)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

# Train the model
model_conv = train_model(model_conv, criterion, optimizer_conv, exp_lr_scheduler, num_epochs=5)

# Visualize model performance
visualize_model(model_conv)

# Predict on a specific image
visualize_model_predictions(model_conv, os.path.join(data_dir, 'val', 'bees', 'bees2.jpg'))

plt.ioff()
plt.show()

In this post, we’ve demonstrated transfer learning with PyTorch using a pre-trained ResNet18 model. By fine-tuning only the final layer, we achieved good performance on the Hymenoptera dataset with minimal training time. This approach is versatile and can be adapted to other datasets and tasks—try experimenting with different models, datasets, or unfreezing more layers for even better results!

Sample Output
Transfer Learning
Transfer Learning
Transfer Learning
Transfer Learning





Building AlexNet from Scratch with PyTorch: A Step-by-Step Guide

Building AlexNet from Scratch with PyTorch

In the world of deep learning, AlexNet holds a special place as a groundbreaking convolutional neural network (CNN) that won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. Developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, it demonstrated the power of deep learning and GPUs for image classification. In this blog post, we’ll implement AlexNet from scratch using PyTorch, explore its architecture, and provide a working code example.

What is AlexNet?

AlexNet is a deep CNN designed to classify images into 1000 categories. It features five convolutional layers, three max-pooling layers, and three fully connected layers, with ReLU activations and dropout for regularization. Its innovative use of large kernels, overlapping pooling, and GPU acceleration made it a milestone in computer vision.

Let’s dive into the implementation!

Prerequisites

Before running the code, ensure you have the following installed:

  • PyTorch: For building and running the model (pip install torch).
  • torchsummary: For visualizing the model summary (pip install torchsummary).
  • A Python environment (e.g., version 3.8+).

The AlexNet Architecture

AlexNet expects an input image of size 227x227 pixels with 3 color channels (RGB). Its structure can be broken into two main parts:

  1. Feature Extraction: A series of convolutional and pooling layers.
  2. Classification: Fully connected layers to produce class predictions.

Here’s the breakdown:

  • Conv1: 96 filters (11x11), stride 4, padding 2 → ReLU → MaxPool (3x3, stride 2).
  • Conv2: 256 filters (5x5), padding 2 → ReLU → MaxPool (3x3, stride 2).
  • Conv3: 384 filters (3x3), padding 1 → ReLU.
  • Conv4: 384 filters (3x3), padding 1 → ReLU.
  • Conv5: 256 filters (3x3), padding 1 → ReLU → MaxPool (3x3, stride 2).
  • FC Layers: 9216 → 4096 → 4096 → 1000, with dropout (p=0.5) and ReLU.

The output of the final convolutional layer is flattened to 256 * 6 * 6 = 9216 features (for a 227x227 input), feeding into the classifier.


import torch
import torch.nn as nn
import torch.optim as optim
from torchsummary import summary  # Assuming this is already installed

class AlexNet(nn.Module):
    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()

        self.features = nn.Sequential(
            # Layer 1: Convolution + ReLU + MaxPool
            nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),

            # Layer 2: Convolution + ReLU + MaxPool
            nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),

            # Layer 3: Convolution + ReLU
            nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),

            # Layer 4: Convolution + ReLU
            nn.Conv2d(384, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),

            # Layer 5: Convolution + ReLU + MaxPool
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )

        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),  # Explicitly set dropout probability
            nn.Linear(256 * 6 * 6, 4096),  # 9216 input features
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), 256 * 6 * 6)  # Explicitly flatten to 9216
        x = self.classifier(x)
        return x

# Instantiate the model
alexnet_model = AlexNet(num_classes=1000)

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
alexnet_model = alexnet_model.to(device)

# Print model structure
print(alexnet_model)

# Print summary (assuming input size of 3x227x227)
summary(alexnet_model, (3, 227, 227))

Output screenshot
AlexNet
Alexnet


AlexNet
AlexNet




Conclusion

Implementing AlexNet in PyTorch is a great way to understand CNNs and deep learning fundamentals. This code provides a foundation you can extend—try training it on a dataset like MNIST or CIFAR-10 or ImageNet. 

This blog post is beginner-friendly yet detailed enough for intermediate learners. It includes the corrected source code and explains its purpose, making it a useful resource for anyone exploring deep learning with PyTorch. Let me know if you’d like adjustments!

Powered by Blogger.

About Me

Featured Post

5G Network Simulation in NS3 using mmWave | NS3 Tutorial 2024

5G Network Simulation in NS3 Using mmWave This post shows the installation of ns3mmwave in Ubuntu 24.04 and simulates 5G networks in ns3. In...

Contact form

Name

Email *

Message *

Total Pageviews

Search This Blog

Pages

Pages

Pages - Menu

Most Popular

Popular Posts