Python Concurrency: Threads vs. Processes
Speed up your programs with parallel execution!
1. Threads vs. Processes
Feature | Threads | Processes |
---|---|---|
Memory | Share the same memory space | Separate memory space |
Overhead | Lightweight (faster to create) | Heavyweight (slower to create) |
GIL Impact | Bound by Global Interpreter Lock (GIL) | Bypass GIL (true parallelism) |
Use Case | I/O-bound tasks (e.g., web requests, files) | CPU-bound tasks (e.g., math, data crunching) |
2. Multithreading (threading Module)
Best for I/O-bound tasks where waiting is involved (e.g., APIs, file ops).
Example: Download Simulator
import threading
import time
def download_file(url):
print(f"Downloading {url}...")
time.sleep(2) # Simulate I/O wait
print(f"Finished {url}")
urls = ["https://example.com/file1", "https://example.com/file2"]
# Create threads
threads = []
for url in urls:
thread = threading.Thread(target=download_file, args=(url,))
thread.start()
threads.append(thread)
# Wait for all threads to finish
for thread in threads:
thread.join()
print("All downloads complete! 🚀")
Output:
Downloading https://example.com/file1...
Downloading https://example.com/file2...
[After 2 seconds]
Finished https://example.com/file1
Finished https://example.com/file2
All downloads complete! 🚀
Key Notes:
- Threads run concurrently but not in parallel (due to GIL).
- Use
Lock
to prevent race conditions:
lock = threading.Lock()
with lock:
# Access shared resource
3. Multiprocessing (multiprocessing Module)
Best for CPU-bound tasks that need true parallelism.
Example: Number Cruncher
import multiprocessing
import time
def calculate_square(numbers):
for n in numbers:
time.sleep(0.2) # Simulate CPU work
print(f"Square: {n*n}")
numbers = [1, 2, 3, 4]
# Create processes
processes = []
for i in range(2):
# Split work between processes
p = multiprocessing.Process(target=calculate_square, args=(numbers[i*2 : (i+1)*2],))
p.start()
processes.append(p)
# Wait for all processes
for p in processes:
p.join()
print("All calculations done! 🔢")
Output:
Square: 1
Square: 4
Square: 9
Square: 16
All calculations done! 🔢
Key Notes:
- Processes have no shared memory (use
Queue
orPipe
for communication). - Avoid excessive processes (overhead vs. benefit).
4. Real-World Use Cases
Threads:
- Web scraping (multiple URLs at once).
- GUI apps (keep UI responsive during long tasks).
- Handling multiple client connections (e.g., servers).
Processes:
- Data processing (e.g., Pandas operations).
- Image/video rendering.
- Machine learning model training.
5. Common Mistakes
- Threads for CPU-bound tasks: Won’t speed up due to GIL.
- Race conditions: Shared data accessed by multiple threads → Use locks.
- Too many processes: High memory/CPU overhead.
6. Best Practices
- I/O-bound? → Use threads.
- CPU-bound? → Use processes.
- Shared data in threads? → Use
threading.Lock
. - Inter-process communication? → Use
multiprocessing.Queue
.
Performance Comparison
Task Type | Threads | Processes |
---|---|---|
I/O-bound | ✅ Faster | ❌ Slower (overhead) |
CPU-bound | ❌ No speedup (GIL) | ✅ Faster |
Fun Activity: Build a Speed Test
Compare thread vs. process performance:
import time
import threading
import multiprocessing
def task():
time.sleep(1) # Simulate mixed workload
# Threads
start = time.time()
threads = [threading.Thread(target=task) for _ in range(10)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"Threads: {time.time() - start:.2f}s")
# Processes
start = time.time()
processes = [multiprocessing.Process(target=task) for _ in range(10)]
for p in processes:
p.start()
for p in processes:
p.join()
print(f"Processes: {time.time() - start:.2f}s")
Key Takeaways
- ✅ Threads: Share memory, good for I/O tasks, limited by GIL.
- ✅ Processes: Isolated memory, true parallelism, ideal for CPU work.
- ✅ Choose wisely: Match the tool (threads/processes) to the task type.
What’s Next?
Learn asyncio for modern asynchronous I/O operations!
Tags:
python