šŸ

DDQN-paper-into-code

Open Source
PythonPYTHON

DDQN paper into code

No description provided on GitHub.

Created
Jan 2026
Last Updated
Jan 2026
Stars
0 ⭐
Status
Available

Deep Reinforcement Learning with Double Q-learning

Implementation of the research paper "Deep Reinforcement Learning with Double Q-learning" by Hado van Hasselt, Arthur Guez, and David Silver (DeepMind, 2015).

šŸ“„ Paper Summary

The Problem: Overestimation Bias in Q-Learning

Traditional Deep Q-Networks (DQN) suffer from overestimation bias because they use the same network to both:

  1. Select the best action (argmax)
  2. Evaluate that action's value

This leads to systematic overestimation of Q-values, especially in stochastic environments, which can harm learning performance.

The Solution: Double Q-Learning

The paper introduces Double DQN (DDQN), which decouples action selection from action evaluation:

  • Online Network: Selects the best action for the next state
  • Target Network: Evaluates the value of that selected action

Key Formula:

Q_target = r + γ * Q_target(s', argmax_a Q_online(s', a))

Instead of DQN's formula:

Q_target = r + γ * max_a Q_target(s', a)

Benefits

āœ… Reduces overestimation - More accurate Q-value estimates
āœ… Better generalization - Improves performance in noisy/stochastic environments
āœ… Same computational cost - No additional overhead compared to DQN
āœ… State-of-the-art results - Achieves superior performance on Atari 2600 games


šŸ—ļø Implementation Details

This implementation applies DDQN to Atari Breakout using PyTorch and Gymnasium (OpenAI Gym).

Architecture

Neural Network

Input: 4 stacked grayscale frames (84Ɨ84Ɨ4)
Conv1: 32 filters, 8Ɨ8 kernel, stride 4 → ReLU
Conv2: 64 filters, 4Ɨ4 kernel, stride 2 → ReLU
Conv3: 64 filters, 3Ɨ3 kernel, stride 1 → ReLU
Flatten
FC1: 512 units → ReLU
FC2: num_actions (4 for Breakout)

Key Components

  1. Preprocessing Pipeline

    • FireResetEnv: Auto-launches ball in Breakout (critical!)
    • AtariPreprocessing: Grayscale conversion + 84Ɨ84 resize
    • FrameStackObservation: Stacks 4 consecutive frames for temporal information
  2. Replay Buffer

    • Capacity: 100,000 transitions
    • Uniform random sampling
    • Stores: (state, action, reward, next_state, done)
  3. Training Optimizations

    • Reward Clipping: Clips rewards to [-1, +1] for stability
    • Gradient Clipping: Clips gradients to [-1, +1] to prevent exploding gradients
    • Huber Loss: Smooth L1 loss for robust learning
    • Target Network: Updated every 1,000 steps

Hyperparameters

ParameterValueDescription
Total Frames5,000,000Total training steps
Batch Size32Minibatch size for learning
Learning Rate0.0001Adam optimizer learning rate
Gamma (γ)0.99Discount factor
Epsilon Start1.0Initial exploration rate
Epsilon End0.01Final exploration rate
Epsilon Decay1,000,000Frames to decay epsilon
Target Update1,000Steps between target network updates
Replay Start10,000Frames before learning begins

šŸš€ Usage

Prerequisites

pip install -r requirements.txt

Requirements:

  • Python 3.8+
  • PyTorch
  • Gymnasium
  • ALE (Arcade Learning Environment)
  • OpenCV
  • NumPy
  • Matplotlib
  • imageio

Training

python main.py

Training Progress:

  • Models saved every 100 episodes → models/ddqn_breakout_{episode}.pth
  • Training graphs saved every 100 episodes → graphs/training_step_{episode}.png
  • Gameplay recordings (GIFs) saved every 50 episodes → recordings/episode_{episode}.gif

Estimated Training Time:

  • CPU: ~24-48 hours for 5M frames
  • GPU (CUDA): ~4-8 hours

šŸ“Š Results

The implementation generates:

  1. Training Graphs (graphs/)

    • Episode rewards over time
    • 100-episode moving average
    • Tracks learning progress
  2. Gameplay Recordings (recordings/)

    • High-quality GIFs (upscaled 3x)
    • Shows agent's gameplay every 50 episodes
    • Greedy policy (ε=0.01) for best performance
  3. Model Checkpoints (models/)

    • Saved every 100 episodes
    • Can resume training or evaluate later

Directory Structure

DDQN-paper-into-code/
ā”œā”€ā”€ main.py              # Main training script
ā”œā”€ā”€ requirements.txt     # Python dependencies
ā”œā”€ā”€ README.md           # This file
ā”œā”€ā”€ graphs/             # Training progress plots
ā”œā”€ā”€ recordings/         # Gameplay GIFs
└── models/             # Saved model checkpoints

🧠 Key Implementation Highlights

Double Q-Learning Core (lines 169-172)

with torch.no_grad():
    # Online network selects the best action
    next_actions = self.online_net(next_states).argmax(dim=1, keepdim=True)
    # Target network evaluates that action
    next_q_values = self.target_net(next_states).gather(1, next_actions)
    target_q = rewards + (self.gamma * next_q_values * (~dones))

This is the heart of DDQN - decoupling action selection from evaluation.

Exploration Strategy

Uses ε-greedy with linear decay:

  • Start: 100% random actions (ε=1.0)
  • Decay over 1M frames
  • End: 1% random actions (ε=0.01)

šŸ“š References

Original Paper:

Related Papers:


šŸŽ® Environment

Game: Atari Breakout (NoFrameskip-v4)

Objective: Use a paddle to bounce a ball and break bricks

Action Space: 4 discrete actions

  • 0: NOOP
  • 1: FIRE (launch ball)
  • 2: RIGHT
  • 3: LEFT

State Space: 4 stacked 84Ɨ84 grayscale frames


šŸ¤ Acknowledgments

This implementation is based on the seminal work by DeepMind researchers and follows best practices from:

  • Original DDQN paper
  • OpenAI Baselines
  • PyTorch DQN tutorial
  • Atari preprocessing techniques from DQN literature

šŸ“ License

This project is for educational purposes, implementing the research paper "Deep Reinforcement Learning with Double Q-learning" for learning and demonstration.


šŸ”— Author

Created as part of learning Deep Reinforcement Learning and implementing research papers into working code.

GitHub Repository: https://github.com/satyammistari/DDQN-paper-into-code

Other Projects