ch10s2_ThreadingAndMultiprocessing
Python allows you to run multiple tasks seemingly at once through **concurrency** (switching between tasks) and **parallelism** (executing tasks truly simultaneously).
Chapter 10: Advanced Topics — Threading and Multiprocessing
⚙️ Threading and Multiprocessing: True Power of Parallelism in Python
Python allows you to run multiple tasks seemingly at once through concurrency (switching between tasks) and parallelism (executing tasks truly simultaneously).
Two major modules enable this: threading and multiprocessing.
🧩 1. Concurrency vs Parallelism
| Concept | Description | Example Use Case |
|---|---|---|
| Concurrency | Doing many tasks at once (by switching rapidly) | Downloading multiple web pages |
| Parallelism | Doing many tasks simultaneously using multiple CPU cores | Image processing, mathematical simulations |
🧠 2. The Global Interpreter Lock (GIL)
Python’s Global Interpreter Lock (GIL) ensures that only one thread executes Python bytecode at a time — even on multi-core processors.
This means:
- Threads are best for I/O-bound tasks (waiting for input/output).
- Multiprocessing is best for CPU-bound tasks (heavy computation).
💡 GIL ≠ “no parallelism.” You can still achieve parallelism using processes or native code extensions (NumPy, TensorFlow, etc.).
🧵 3. Threading — Lightweight Concurrency
Threads share the same memory space, making them ideal for tasks like file I/O, HTTP requests, and reading from multiple sensors.
Example — Using threading.Thread
import threading
import time
def print_numbers():
for i in range(1, 6):
time.sleep(0.5)
print(f"Number: {i}")
def print_letters():
for letter in "abcde":
time.sleep(0.5)
print(f"Letter: {letter}")
# Create threads
t1 = threading.Thread(target=print_numbers)
t2 = threading.Thread(target=print_letters)
# Start threads
t1.start()
t2.start()
# Wait for threads to finish
t1.join()
t2.join()
print("All threads completed!")
Output (interleaved order):
Number: 1
Letter: a
Number: 2
Letter: b
...
🔐 4. Synchronizing Threads
Threads share data, which can lead to race conditions if not handled properly.
Use Lock to ensure one thread accesses critical sections at a time.
lock = threading.Lock()
counter = 0
def increment():
global counter
for _ in range(100000):
with lock: # automatically acquire/release lock
counter += 1
t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)
t1.start(); t2.start()
t1.join(); t2.join()
print("Final counter:", counter)
Without the lock, you’d likely get a corrupted result due to overlapping updates.
🚀 5. Modern Threading — ThreadPoolExecutor
A high-level interface for managing threads.
from concurrent.futures import ThreadPoolExecutor
import requests
urls = [
"https://www.example.com",
"https://www.python.org",
"https://www.openai.com"
]
def fetch(url):
response = requests.get(url)
return (url, len(response.text))
with ThreadPoolExecutor(max_workers=3) as executor:
results = executor.map(fetch, urls)
for url, size in results:
print(f"{url}: {size} bytes")
💡 Best for I/O-heavy tasks like web scraping, file reading, or socket communication.
🧮 6. Multiprocessing — True Parallelism
While threads share memory, processes each have their own — allowing true parallel execution on multiple cores.
Example — Using multiprocessing.Pool
import multiprocessing
def square(n):
return n * n
if __name__ == "__main__":
numbers = [1, 2, 3, 4, 5]
with multiprocessing.Pool(processes=3) as pool:
result = pool.map(square, numbers)
print(result)
Output:
[1, 4, 9, 16, 25]
🔄 7. Sharing Data Between Processes
Because each process has its own memory, you need Queues or Managers for communication.
Example — Using Queue
from multiprocessing import Process, Queue
def square_worker(numbers, queue):
for n in numbers:
queue.put(n * n)
if __name__ == "__main__":
q = Queue()
nums = [1, 2, 3, 4]
p = Process(target=square_worker, args=(nums, q))
p.start()
p.join()
results = []
while not q.empty():
results.append(q.get())
print("Results:", results)
⚡ 8. Comparing Threading and Multiprocessing
| Feature | Threading | Multiprocessing |
|---|---|---|
| Memory Space | Shared | Separate |
| Overhead | Low | High |
| Best For | I/O-bound tasks | CPU-bound tasks |
| Parallel Execution | No (limited by GIL) | Yes (true parallelism) |
| Safety | Requires Locks | Independent memory |
| Example Use | Web scraping | Image processing |
🔁 9. Combining Threading and Multiprocessing
You can mix both — e.g., use multiprocessing for heavy computation and threading for concurrent I/O.
import multiprocessing, threading, requests
def fetch(url):
return requests.get(url).status_code
def parallel_fetch(urls):
with threading.Semaphore(5):
with multiprocessing.Pool(3) as pool:
return pool.map(fetch, urls)
if __name__ == "__main__":
urls = ["https://www.python.org"] * 6
print(parallel_fetch(urls))
⚠️ Be careful: combining both adds complexity — always measure performance.
🧭 10. Best Practices
✅ Use threading for I/O-bound tasks (networking, file I/O).
✅ Use multiprocessing for CPU-bound tasks (math, rendering).
✅ Always protect shared resources with Lock.
✅ Wrap multiprocessing code under if __name__ == "__main__": (required on Windows/macOS).
✅ Use concurrent.futures API for simplicity.
✅ For asynchronous I/O, consider asyncio instead of threads.
🧠 11. Real-World Example — Hybrid Data Processor
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import time
def fetch_data(source):
time.sleep(1)
return f"Fetched data from {source}"
def process_data(data):
time.sleep(0.5)
return data.upper()
sources = ["API", "Database", "File"]
# Fetch concurrently (I/O)
with ThreadPoolExecutor() as tpool:
raw_data = list(tpool.map(fetch_data, sources))
# Process in parallel (CPU)
with ProcessPoolExecutor() as ppool:
processed = list(ppool.map(process_data, raw_data))
print(processed)
Output:
['FETCHED DATA FROM API', 'FETCHED DATA FROM DATABASE', 'FETCHED DATA FROM FILE']
🧾 12. Summary
| Concept | Description | Example |
|---|---|---|
| Threading | Concurrent execution within one process | threading.Thread, ThreadPoolExecutor |
| Multiprocessing | True parallel execution using multiple processes | multiprocessing.Pool, ProcessPoolExecutor |
| Best For | I/O-bound tasks | CPU-bound tasks |
| Key Limitation | GIL (one thread runs at a time) | Higher memory use |
| High-level API | concurrent.futures | concurrent.futures |
Mastering concurrency gives you control over performance — letting you balance speed, simplicity, and scalability depending on your workload.