ch10s2_ThreadingAndMultiprocessing

Python allows you to run multiple tasks seemingly at once through **concurrency** (switching between tasks) and **parallelism** (executing tasks truly simultaneously).

Chapter 10: Advanced Topics — Threading and Multiprocessing

⚙️ Threading and Multiprocessing: True Power of Parallelism in Python

Python allows you to run multiple tasks seemingly at once through concurrency (switching between tasks) and parallelism (executing tasks truly simultaneously).
Two major modules enable this: threading and multiprocessing.


🧩 1. Concurrency vs Parallelism

ConceptDescriptionExample Use Case
ConcurrencyDoing many tasks at once (by switching rapidly)Downloading multiple web pages
ParallelismDoing many tasks simultaneously using multiple CPU coresImage processing, mathematical simulations

🧠 2. The Global Interpreter Lock (GIL)

Python’s Global Interpreter Lock (GIL) ensures that only one thread executes Python bytecode at a time — even on multi-core processors.
This means:

💡 GIL ≠ “no parallelism.” You can still achieve parallelism using processes or native code extensions (NumPy, TensorFlow, etc.).


🧵 3. Threading — Lightweight Concurrency

Threads share the same memory space, making them ideal for tasks like file I/O, HTTP requests, and reading from multiple sensors.

Example — Using threading.Thread

import threading
import time

def print_numbers():
    for i in range(1, 6):
        time.sleep(0.5)
        print(f"Number: {i}")

def print_letters():
    for letter in "abcde":
        time.sleep(0.5)
        print(f"Letter: {letter}")

# Create threads
t1 = threading.Thread(target=print_numbers)
t2 = threading.Thread(target=print_letters)

# Start threads
t1.start()
t2.start()

# Wait for threads to finish
t1.join()
t2.join()

print("All threads completed!")

Output (interleaved order):

Number: 1
Letter: a
Number: 2
Letter: b
...

🔐 4. Synchronizing Threads

Threads share data, which can lead to race conditions if not handled properly.
Use Lock to ensure one thread accesses critical sections at a time.

lock = threading.Lock()
counter = 0

def increment():
    global counter
    for _ in range(100000):
        with lock:  # automatically acquire/release lock
            counter += 1

t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)
t1.start(); t2.start()
t1.join(); t2.join()

print("Final counter:", counter)

Without the lock, you’d likely get a corrupted result due to overlapping updates.


🚀 5. Modern Threading — ThreadPoolExecutor

A high-level interface for managing threads.

from concurrent.futures import ThreadPoolExecutor
import requests

urls = [
    "https://www.example.com",
    "https://www.python.org",
    "https://www.openai.com"
]

def fetch(url):
    response = requests.get(url)
    return (url, len(response.text))

with ThreadPoolExecutor(max_workers=3) as executor:
    results = executor.map(fetch, urls)

for url, size in results:
    print(f"{url}: {size} bytes")

💡 Best for I/O-heavy tasks like web scraping, file reading, or socket communication.


🧮 6. Multiprocessing — True Parallelism

While threads share memory, processes each have their own — allowing true parallel execution on multiple cores.

Example — Using multiprocessing.Pool

import multiprocessing

def square(n):
    return n * n

if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]
    with multiprocessing.Pool(processes=3) as pool:
        result = pool.map(square, numbers)

    print(result)

Output:

[1, 4, 9, 16, 25]

🔄 7. Sharing Data Between Processes

Because each process has its own memory, you need Queues or Managers for communication.

Example — Using Queue

from multiprocessing import Process, Queue

def square_worker(numbers, queue):
    for n in numbers:
        queue.put(n * n)

if __name__ == "__main__":
    q = Queue()
    nums = [1, 2, 3, 4]

    p = Process(target=square_worker, args=(nums, q))
    p.start()
    p.join()

    results = []
    while not q.empty():
        results.append(q.get())

    print("Results:", results)

⚡ 8. Comparing Threading and Multiprocessing

FeatureThreadingMultiprocessing
Memory SpaceSharedSeparate
OverheadLowHigh
Best ForI/O-bound tasksCPU-bound tasks
Parallel ExecutionNo (limited by GIL)Yes (true parallelism)
SafetyRequires LocksIndependent memory
Example UseWeb scrapingImage processing

🔁 9. Combining Threading and Multiprocessing

You can mix both — e.g., use multiprocessing for heavy computation and threading for concurrent I/O.

import multiprocessing, threading, requests

def fetch(url):
    return requests.get(url).status_code

def parallel_fetch(urls):
    with threading.Semaphore(5):
        with multiprocessing.Pool(3) as pool:
            return pool.map(fetch, urls)

if __name__ == "__main__":
    urls = ["https://www.python.org"] * 6
    print(parallel_fetch(urls))

⚠️ Be careful: combining both adds complexity — always measure performance.


🧭 10. Best Practices

✅ Use threading for I/O-bound tasks (networking, file I/O).
✅ Use multiprocessing for CPU-bound tasks (math, rendering).
✅ Always protect shared resources with Lock.
✅ Wrap multiprocessing code under if __name__ == "__main__": (required on Windows/macOS).
✅ Use concurrent.futures API for simplicity.
✅ For asynchronous I/O, consider asyncio instead of threads.


🧠 11. Real-World Example — Hybrid Data Processor

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import time

def fetch_data(source):
    time.sleep(1)
    return f"Fetched data from {source}"

def process_data(data):
    time.sleep(0.5)
    return data.upper()

sources = ["API", "Database", "File"]

# Fetch concurrently (I/O)
with ThreadPoolExecutor() as tpool:
    raw_data = list(tpool.map(fetch_data, sources))

# Process in parallel (CPU)
with ProcessPoolExecutor() as ppool:
    processed = list(ppool.map(process_data, raw_data))

print(processed)

Output:

['FETCHED DATA FROM API', 'FETCHED DATA FROM DATABASE', 'FETCHED DATA FROM FILE']

🧾 12. Summary

ConceptDescriptionExample
ThreadingConcurrent execution within one processthreading.Thread, ThreadPoolExecutor
MultiprocessingTrue parallel execution using multiple processesmultiprocessing.Pool, ProcessPoolExecutor
Best ForI/O-bound tasksCPU-bound tasks
Key LimitationGIL (one thread runs at a time)Higher memory use
High-level APIconcurrent.futuresconcurrent.futures

Mastering concurrency gives you control over performance — letting you balance speed, simplicity, and scalability depending on your workload.