ch17s3_PerformanceOptimization

Performance optimization is about writing **efficient, scalable, and responsive** code.

Chapter 17: Best Practices and Tips — Performance Optimization

⚡ Introduction: The Art of Performance Optimization

Performance optimization is about writing efficient, scalable, and responsive code.
But remember: optimization without measurement is guessing. Always identify real bottlenecks before changing anything.

“Premature optimization is the root of all evil.” — Donald Knuth

Optimized code improves:

Execution speed
Memory usage
Scalability under load
User experience and responsiveness

🧭 1. The Performance Optimization Mindset

Measure before you optimize — profile to find the real bottlenecks.
Fix the biggest problem first — don’t waste time on micro-optimizations.
Prefer clarity over cleverness — optimization should not reduce readability.
Benchmark after every change — ensure your “fix” actually helps.
Don’t optimize everything — focus on hot paths and critical code.

🔍 2. Profiling and Measurement Tools

Before you optimize, you must know where time and memory are being spent.

🕒 `timeit`: Quick Micro-Benchmarks

import timeit

print(timeit.timeit("sum(range(1000))", number=10000))

🧩 `cProfile`: Full Code Profiling

import cProfile

def slow_function():
    total = 0
    for i in range(10_000_000):
        total += i
    return total

cProfile.run('slow_function()')

📈 `line_profiler` and `memory_profiler`

pip install line_profiler memory_profiler

@profile
def process_data():
    data = [x**2 for x in range(10_000)]
    return sum(data)

python -m memory_profiler myscript.py

Use profiling to locate hotspots — not assumptions.

🧮 3. Algorithmic Optimization

The biggest performance wins come from better algorithms and data structures.

Problem	Naive	Optimized
Search	Linear search `O(n)`	Binary search `O(log n)`
Membership test	List	Set / Dict (`O(1)` average)
Sorting	Manual loops	Built-in `sorted()` (Timsort)
Counting	Loops	`collections.Counter()`

Example: Linear vs Binary Search

# ❌ Inefficient O(n)
def linear_search(array, target):
    for i, value in enumerate(array):
        if value == target:
            return i
    return -1

# ✅ Efficient O(log n)
def binary_search(sorted_array, target):
    left, right = 0, len(sorted_array) - 1
    while left <= right:
        mid = (left + right) // 2
        if sorted_array[mid] == target:
            return mid
        elif sorted_array[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1

🧠 4. Data Structure Optimization

Choosing the right data structure can yield huge improvements.

Task	Recommended Structure	Reason
Frequent lookups	`set` or `dict`	Constant-time access
Ordered data	`list` or `deque`	Fast iteration
Counting items	`collections.Counter`	Built-in tallying
Fixed-size queue	`collections.deque(maxlen=N)`	Efficient rotation
Large numeric data	`numpy.array`	Vectorized speed

🧩 5. Caching and Memoization

Avoid recomputation for repeated inputs.

import functools

@functools.lru_cache(maxsize=128)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

Pro Tip: Use lru_cache for recursive functions and functools.cache (Python 3.9+) for simple caching.

🧮 6. Vectorization with NumPy

Vectorized operations are much faster than loops in Python because they run in optimized C code.

import numpy as np

# ❌ Slow loop
data = [i * 2 for i in range(10_000_000)]

# ✅ Fast vectorized version
arr = np.arange(10_000_000)
result = arr * 2

Avoid Python loops for numerical tasks — use vectorization whenever possible.

🧵 7. Concurrency and Parallelism

For I/O-bound tasks (like API calls or file reads), use threads.
For CPU-bound tasks (like computation), use multiprocessing.

Threads (I/O-bound)

import threading

def download_file(url):
    print(f"Downloading {url}")

urls = ["a", "b", "c"]
threads = [threading.Thread(target=download_file, args=(u,)) for u in urls]

for t in threads: t.start()
for t in threads: t.join()

Processes (CPU-bound)

from multiprocessing import Pool

def compute_square(n):
    return n * n

with Pool(4) as pool:
    results = pool.map(compute_square, range(10))
print(results)

Choose threads for waiting, processes for working.

🧠 8. Memory Optimization

Memory usage often limits scalability more than CPU.

Use Generators Instead of Lists

# ❌ Loads everything in memory
squares = [x**2 for x in range(10_000_000)]

# ✅ Lazy evaluation (no large memory footprint)
squares = (x**2 for x in range(10_000_000))

Delete Unused Objects

Use del obj and garbage collection when large objects are no longer needed.

import gc
del large_dataset
gc.collect()

Measure Memory Usage

import tracemalloc

tracemalloc.start()
# run heavy code
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1e6:.2f} MB; Peak: {peak / 1e6:.2f} MB")
tracemalloc.stop()

🧪 9. Example: Real Optimization Workflow

Let’s optimize a real-world snippet step-by-step:

# ❌ Original version (slow)
def slow_sum(n):
    total = 0
    for i in range(n):
        total += i
    return total

# ✅ Optimized version (fast)
def fast_sum(n):
    return n * (n - 1) // 2

Performance Comparison

import timeit

print("Slow:", timeit.timeit("slow_sum(10_000_000)", globals=globals(), number=1))
print("Fast:", timeit.timeit("fast_sum(10_000_000)", globals=globals(), number=1))

Optimization is about finding smarter ways, not just “faster computers.”

⚠️ 10. Avoid Over‑Optimization

Optimization is powerful but dangerous when done prematurely.
Here’s how to stay safe:

Don’t…	Instead…
Rewrite everything in C	Profile first — only optimize hot paths
Obsess over microseconds	Focus on algorithmic efficiency
Sacrifice readability for speed	Use clear, maintainable solutions
Guess performance bottlenecks	Measure with profilers
Forget to test correctness	Always verify outputs after changes

✅ 11. Performance Optimization Checklist

🚀 Conclusion

Performance optimization is a balance between speed, clarity, and maintainability.
The best developers optimize intelligently — guided by measurement, not intuition.

“The fastest code is the code you don’t run.”

By understanding algorithms, data structures, caching, and profiling, you can craft Python applications that run efficiently — not only fast, but elegantly fast.