How to use Python debugger
Debugging is an essential skill for all programmers. No matter how careful you are, you’ll inevitably encounter bugs in your code. The Python debugger (pdb
) provides powerful tools to help you find and fix these issues efficiently.
Introduction to the Python Debugger (pdb)
Python comes with a built-in debugger module called pdb
(Python Debugger). This interactive debugger allows you to:
- Pause program execution at specific points (breakpoints)
- Step through your code line by line
- Inspect variable values at any point
- Evaluate expressions in the current context
- Modify variables during runtime
Starting the Debugger
There are several ways to start the Python debugger:
1. Using pdb.set_trace()
The simplest way to use the debugger is to insert pdb.set_trace()
at the point where you want to start debugging:
import pdb
def calculate_average(numbers):
total = 0
for num in numbers:
total += num
pdb.set_trace() # Debugger will start here
average = total / len(numbers)
return average
result = calculate_average([1, 2, 3, 4, 5])
print(f"The average is {result}")
When this code runs, the program will pause at the pdb.set_trace()
line and drop you into the debugger console.
Note:
In Python 3.7+, you can use the built-in breakpoint()
function instead of importing pdb
:
def calculate_average(numbers):
total = 0
for num in numbers:
total += num
breakpoint() # Equivalent to pdb.set_trace()
average = total / len(numbers)
return average
2. Running a Script with pdb
You can run an entire script under the debugger from the command line:
python -m pdb my_script.py
This starts the debugger at the beginning of the script. The program will pause before the first line executes.
3. Post-Mortem Debugging
You can also start the debugger after an exception has occurred:
import pdb
try:
# Some code that might raise an exception
result = 10 / 0
except:
pdb.post_mortem() # Start the debugger at the point of the exception
Basic Debugging Commands
Once the debugger is running, you can use various commands to control execution and examine the program state:
Command | Description |
---|---|
h or help | Show list of available commands |
q or quit | Exit the debugger |
c or continue | Continue execution until next breakpoint |
n or next | Execute the current line and move to the next line (doesn’t enter functions) |
s or step | Step into a function call |
r or return | Continue execution until the current function returns |
l or list | Show the current position in the code |
p expression | Print the value of an expression |
pp expression | Pretty-print the value of an expression |
w or where | Show the current call stack |
b or break | Set a breakpoint |
clear | Clear breakpoints |
u or up | Move up one level in the call stack |
d or down | Move down one level in the call stack |
Here’s a sample debugging session:
> my_script.py(7)calculate_average()
-> average = total / len(numbers)
(Pdb) p total
15
(Pdb) p len(numbers)
5
(Pdb) p numbers
[1, 2, 3, 4, 5]
(Pdb) n
> my_script.py(8)calculate_average()
-> return average
(Pdb) p average
3.0
(Pdb) c
The average is 3.0
Practical Debugging Techniques
Setting Conditional Breakpoints
You can set breakpoints that only trigger when certain conditions are met:
import pdb
def process_items(items):
results = []
for i, item in enumerate(items):
processed = item * 2
if i == 3:
pdb.set_trace() # Only break at the 4th item (index 3)
results.append(processed)
return results
process_items([5, 10, 15, 20, 25, 30])
Alternatively, using the debugger console:
(Pdb) b my_script.py:7, i > 10
Breakpoint 1 at my_script.py:7
(Pdb) c # Continue until the condition is met
Examining Variables and Call Stack
When your program behaves unexpectedly, it’s often because variables don’t contain what you think they do. The debugger lets you examine them:
(Pdb) p locals() # Print all local variables
{'items': [5, 10, 15, 20, 25, 30], 'results': [10, 20, 30], 'i': 3, 'item': 20, 'processed': 40}
(Pdb) pp locals() # Pretty-print for better formatting
{'i': 3,
'item': 20,
'items': [5, 10, 15, 20, 25, 30],
'processed': 40,
'results': [10, 20, 30]}
To see how you got to the current point, examine the call stack:
(Pdb) w
my_script.py(11)<module>()
-> process_items([5, 10, 15, 20, 25, 30])
> my_script.py(7)process_items()
-> pdb.set_trace()
Modifying Variables During Debugging
One powerful feature of the debugger is the ability to modify variables on the fly:
(Pdb) p item
20
(Pdb) item = 100
(Pdb) p processed
40
(Pdb) processed = 200
(Pdb) c # Continue execution with modified values
This allows you to test fixes without restarting the program.
Using Python Debugger in Different Environments
Debugging in Interactive Mode (REPL)
You can use the debugger in the Python interactive shell:
>>> import pdb
>>> def buggy_function():
... x = 1
... y = 0
... pdb.set_trace()
... return x / y
...
>>> buggy_function()
> <stdin>(5)buggy_function()
-> return x / y
(Pdb)
Debugging in Jupyter Notebooks
For Jupyter notebooks, you can use %debug
magic command after an exception occurs:
# Run this cell first
def divide(a, b):
return a / b
# Then run this cell
result = divide(10, 0) # This will raise a ZeroDivisionError
# Now run this cell to debug
%debug
Or use %pdb on
to automatically start the debugger on any exception:
%pdb on
result = divide(10, 0) # Will automatically drop into debugger
IDE Debugging Support
Most Python IDEs offer integrated debugging with graphical interfaces, including:
- PyCharm: Offers visual debugging with breakpoints, variable inspection, and a graphical call stack
- Visual Studio Code: Provides debugging through the Python extension
- Spyder: Includes debugging features similar to MATLAB’s debugger
Advanced Debugging Techniques
Using pdb.run()
You can execute a string of Python code under the debugger:
import pdb
pdb.run('calculate_average([1, 2, 3, 4, 0])')
Debugging Multi-Threaded Programs
Standard pdb
can be challenging for multi-threaded programs. Consider specialized tools:
# Example using threading-specific debugging
import threading
import multiprocessing
import pdb
# For thread debugging, you might need to use thread-specific tools
# or synchronization mechanisms
Remote Debugging
For debugging applications running on a remote server or in a different process:
# On the server side
from remote_pdb import RemotePdb
RemotePdb('0.0.0.0', 4444).set_trace()
# Connect from your local machine using telnet
# $ telnet server_ip 4444
Automated Debugging Tools
Besides manual debugging, Python offers automated tools to help identify issues:
Using the logging
Module
Logging is a non-intrusive way to track program execution:
import logging
# Configure logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
def calculate_average(numbers):
logging.debug(f"Starting with numbers: {numbers}")
if not numbers:
logging.warning("Empty list provided!")
return 0
total = sum(numbers)
logging.debug(f"Sum calculated: {total}")
average = total / len(numbers)
logging.debug(f"Average calculated: {average}")
return average
# Test the function
result = calculate_average([1, 2, 3, 4, 5])
print(f"The average is {result}")
Code Profiling
When debugging performance issues, profiling tools can help:
import cProfile
def slow_function():
result = 0
for i in range(1000000):
result += i
return result
# Profile the function
cProfile.run('slow_function()')
Memory Profiling
For memory-related issues, you can use memory profilers:
# First install: pip install memory_profiler
from memory_profiler import profile
@profile
def memory_heavy_function():
big_list = [i for i in range(10000000)]
return len(big_list)
memory_heavy_function()
Debugging Workflow: A Practical Example
Let’s walk through debugging a function that’s supposed to find the median of a list:
def find_median(numbers):
# Sort the list
sorted_numbers = numbers.sort()
# Find the middle position
n = len(sorted_numbers)
middle = n // 2
# Return the median
if n % 2 == 0:
# Even number of elements
return (sorted_numbers[middle - 1] + sorted_numbers[middle]) / 2
else:
# Odd number of elements
return sorted_numbers[middle]
# Test the function
test_list = [5, 2, 9, 1, 7]
median = find_median(test_list)
print(f"The median is: {median}")
Running this will produce an error. Let’s debug it:
import pdb
def find_median(numbers):
# Add a breakpoint
pdb.set_trace()
# Sort the list
sorted_numbers = numbers.sort()
# Find the middle position
n = len(sorted_numbers)
middle = n // 2
# Return the median
if n % 2 == 0:
# Even number of elements
return (sorted_numbers[middle - 1] + sorted_numbers[middle]) / 2
else:
# Odd number of elements
return sorted_numbers[middle]
# Test the function
test_list = [5, 2, 9, 1, 7]
median = find_median(test_list)
print(f"The median is: {median}")
Debugging session:
> script.py(6)find_median()
-> sorted_numbers = numbers.sort()
(Pdb) p numbers
[5, 2, 9, 1, 7]
(Pdb) n
> script.py(9)find_median()
-> n = len(sorted_numbers)
(Pdb) p sorted_numbers
None
We’ve found the first issue! list.sort()
sorts the list in-place and returns None
. Let’s fix this:
(Pdb) numbers.sort()
(Pdb) p numbers
[1, 2, 5, 7, 9]
(Pdb) sorted_numbers = numbers
(Pdb) c
We’ll still get an error. Let’s restart with the fixed code:
def find_median(numbers):
# Make a copy to avoid modifying the original
numbers_copy = numbers.copy()
# Sort the list
numbers_copy.sort() # sorts in-place
# Find the middle position
n = len(numbers_copy)
middle = n // 2
# Return the median
if n % 2 == 0:
# Even number of elements
return (numbers_copy[middle - 1] + numbers_copy[middle]) / 2
else:
# Odd number of elements
return numbers_copy[middle]
# Test with odd number of elements
test_list_odd = [5, 2, 9, 1, 7]
median_odd = find_median(test_list_odd)
print(f"Median of {test_list_odd}: {median_odd}") # Should be 5
# Test with even number of elements
test_list_even = [5, 2, 9, 1, 7, 6]
median_even = find_median(test_list_even)
print(f"Median of {test_list_even}: {median_even}") # Should be 5.5
Common Debugging Patterns and Challenges
Pattern: “It works on my machine”
This often indicates environmental differences:
import sys
import platform
import os
def debug_environment():
"""Print information about the execution environment."""
print(f"Python version: {sys.version}")
print(f"Platform: {platform.platform()}")
print(f"Current working directory: {os.getcwd()}")
print(f"Environment variables: {dict(os.environ)}")
# Call at the beginning of scripts that might be environment-sensitive
debug_environment()
Pattern: “It sometimes fails”
Random failures often indicate race conditions or undefined behavior:
import random
import time
def occasionally_fails():
"""Function that randomly fails."""
if random.random() < 0.3: # 30% chance of failure
raise ValueError("Random failure!")
return "Success"
# Debug by forcing reproducibility
random.seed(42) # Set a fixed seed for reproducible randomness
Challenge: Debugging Long-Running Processes
For processes that take hours to reach the bug:
def long_running_process(iterations):
state = {"count": 0, "checkpoints": []}
try:
for i in range(iterations):
# Save checkpoint every 1000 iterations
if i % 1000 == 0:
state["count"] = i
state["checkpoints"].append(f"Checkpoint at {i}")
# Save checkpoint to disk
import json
with open("checkpoint.json", "w") as f:
json.dump(state, f)
# Actual processing here
process_item(i)
except Exception as e:
print(f"Error at iteration {i}: {e}")
# Analyze the last saved checkpoint
with open("checkpoint.json", "r") as f:
last_state = json.load(f)
print(f"Last successful state: {last_state}")
raise
Debugging Best Practices
Isolate the Problem: Try to narrow down where the issue is occurring.
# Divide and conquer approach def complex_function(): part1 = step1() print("Step 1 completed successfully") part2 = step2() print("Step 2 completed successfully") return combine(part1, part2)
Use Assertions: Add assertions to verify your assumptions.
def calculate_discount(price, discount_percentage): assert 0 <= discount_percentage <= 100, f"Invalid discount: {discount_percentage}" discount = price * (discount_percentage / 100) final_price = price - discount assert final_price <= price, f"Discounted price {final_price} higher than original {price}" return final_price
Add Strategic Print Statements: When a debugger isn’t practical.
def process_data(data): print(f"Starting with data: {data[:5]}... (length: {len(data)})") processed = [] for i, item in enumerate(data): if i % 1000 == 0: print(f"Processing item {i}/{len(data)}") processed.append(transform(item)) print(f"Finished processing. Result length: {len(processed)}") return processed
Keep a Debugging Log: Document what you’ve tried and what you’ve learned.
Use Version Control: Make small, testable changes and commit them.
Write Tests: Automated tests can prevent bugs and help debug them.
import unittest class TestMedianFunction(unittest.TestCase): def test_odd_length_list(self): result = find_median([5, 2, 9, 1, 7]) self.assertEqual(result, 5) def test_even_length_list(self): result = find_median([5, 2, 9, 1, 7, 6]) self.assertEqual(result, 5.5) def test_empty_list(self): with self.assertRaises(ValueError): find_median([]) if __name__ == "__main__": unittest.main()
Exercises
Exercise 1: The following function is supposed to count the frequency of each character in a string, but it has a bug. Use the debugger to find and fix the issue:
def count_characters(text):
frequencies = {}
for char in text:
if char in frequencies:
frequencies[char] += 1
else:
frequencies[char] = 1
return frequencies
# Test the function
result = count_characters("hello world")
print(result) # Expected: {'h': 1, 'e': 1, 'l': 3, 'o': 2, ' ': 1, 'w': 1, 'r': 1, 'd': 1}
Exercise 2: The following recursive function is supposed to calculate the sum of digits in a number, but it has a bug that causes it to enter an infinite recursion. Use the debugger to find and fix the issue:
def sum_of_digits(n):
if n < 10:
return n
else:
last_digit = n % 10
remaining_digits = n / 10 # Bug is here
return last_digit + sum_of_digits(remaining_digits)
# Test the function
print(sum_of_digits(123)) # Expected: 6 (1+2+3)
Exercise 3: Create a function that finds the two numbers in a list that add up to a target value. The function should use the debugger to step through the process and validate its logic:
def find_two_sum(numbers, target):
# Add debugging to this function
import pdb; pdb.set_trace()
# Your solution here
for i in range(len(numbers)):
for j in range(i+1, len(numbers)):
if numbers[i] + numbers[j] == target:
return (numbers[i], numbers[j])
return None
# Test the function
print(find_two_sum([2, 7, 11, 15], 9)) # Expected: (2, 7)
Hint for Exercise 1: The function logic seems correct. Use the debugger to step through the execution with a test case and examine the frequencies
dictionary at each step.
In the next section, we’ll explore how to work with files in Python, including reading from and writing to different file formats.