ch17s3_PerformanceOptimization

Performance optimization is about writing **efficient, scalable, and responsive** code.

Chapter 17: Best Practices and Tips — Performance Optimization

⚡ Introduction: The Art of Performance Optimization

Performance optimization is about writing efficient, scalable, and responsive code.
But remember: optimization without measurement is guessing. Always identify real bottlenecks before changing anything.

“Premature optimization is the root of all evil.” — Donald Knuth

Optimized code improves:


🧭 1. The Performance Optimization Mindset

  1. Measure before you optimize — profile to find the real bottlenecks.
  2. Fix the biggest problem first — don’t waste time on micro-optimizations.
  3. Prefer clarity over cleverness — optimization should not reduce readability.
  4. Benchmark after every change — ensure your “fix” actually helps.
  5. Don’t optimize everything — focus on hot paths and critical code.

🔍 2. Profiling and Measurement Tools

Before you optimize, you must know where time and memory are being spent.

🕒 timeit: Quick Micro-Benchmarks

import timeit

print(timeit.timeit("sum(range(1000))", number=10000))

🧩 cProfile: Full Code Profiling

import cProfile

def slow_function():
    total = 0
    for i in range(10_000_000):
        total += i
    return total

cProfile.run('slow_function()')

📈 line_profiler and memory_profiler

pip install line_profiler memory_profiler
@profile
def process_data():
    data = [x**2 for x in range(10_000)]
    return sum(data)
python -m memory_profiler myscript.py

Use profiling to locate hotspots — not assumptions.


🧮 3. Algorithmic Optimization

The biggest performance wins come from better algorithms and data structures.

ProblemNaiveOptimized
SearchLinear search O(n)Binary search O(log n)
Membership testListSet / Dict (O(1) average)
SortingManual loopsBuilt-in sorted() (Timsort)
CountingLoopscollections.Counter()
# ❌ Inefficient O(n)
def linear_search(array, target):
    for i, value in enumerate(array):
        if value == target:
            return i
    return -1

# ✅ Efficient O(log n)
def binary_search(sorted_array, target):
    left, right = 0, len(sorted_array) - 1
    while left <= right:
        mid = (left + right) // 2
        if sorted_array[mid] == target:
            return mid
        elif sorted_array[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1

🧠 4. Data Structure Optimization

Choosing the right data structure can yield huge improvements.

TaskRecommended StructureReason
Frequent lookupsset or dictConstant-time access
Ordered datalist or dequeFast iteration
Counting itemscollections.CounterBuilt-in tallying
Fixed-size queuecollections.deque(maxlen=N)Efficient rotation
Large numeric datanumpy.arrayVectorized speed

🧩 5. Caching and Memoization

Avoid recomputation for repeated inputs.

import functools

@functools.lru_cache(maxsize=128)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

Pro Tip: Use lru_cache for recursive functions and functools.cache (Python 3.9+) for simple caching.


🧮 6. Vectorization with NumPy

Vectorized operations are much faster than loops in Python because they run in optimized C code.

import numpy as np

# ❌ Slow loop
data = [i * 2 for i in range(10_000_000)]

# ✅ Fast vectorized version
arr = np.arange(10_000_000)
result = arr * 2

Avoid Python loops for numerical tasks — use vectorization whenever possible.


🧵 7. Concurrency and Parallelism

For I/O-bound tasks (like API calls or file reads), use threads.
For CPU-bound tasks (like computation), use multiprocessing.

Threads (I/O-bound)

import threading

def download_file(url):
    print(f"Downloading {url}")

urls = ["a", "b", "c"]
threads = [threading.Thread(target=download_file, args=(u,)) for u in urls]

for t in threads: t.start()
for t in threads: t.join()

Processes (CPU-bound)

from multiprocessing import Pool

def compute_square(n):
    return n * n

with Pool(4) as pool:
    results = pool.map(compute_square, range(10))
print(results)

Choose threads for waiting, processes for working.


🧠 8. Memory Optimization

Memory usage often limits scalability more than CPU.

Use Generators Instead of Lists

# ❌ Loads everything in memory
squares = [x**2 for x in range(10_000_000)]

# ✅ Lazy evaluation (no large memory footprint)
squares = (x**2 for x in range(10_000_000))

Delete Unused Objects

Use del obj and garbage collection when large objects are no longer needed.

import gc
del large_dataset
gc.collect()

Measure Memory Usage

import tracemalloc

tracemalloc.start()
# run heavy code
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1e6:.2f} MB; Peak: {peak / 1e6:.2f} MB")
tracemalloc.stop()

🧪 9. Example: Real Optimization Workflow

Let’s optimize a real-world snippet step-by-step:

# ❌ Original version (slow)
def slow_sum(n):
    total = 0
    for i in range(n):
        total += i
    return total

# ✅ Optimized version (fast)
def fast_sum(n):
    return n * (n - 1) // 2

Performance Comparison

import timeit

print("Slow:", timeit.timeit("slow_sum(10_000_000)", globals=globals(), number=1))
print("Fast:", timeit.timeit("fast_sum(10_000_000)", globals=globals(), number=1))

Optimization is about finding smarter ways, not just “faster computers.”


⚠️ 10. Avoid Over‑Optimization

Optimization is powerful but dangerous when done prematurely.
Here’s how to stay safe:

Don’t…Instead…
Rewrite everything in CProfile first — only optimize hot paths
Obsess over microsecondsFocus on algorithmic efficiency
Sacrifice readability for speedUse clear, maintainable solutions
Guess performance bottlenecksMeasure with profilers
Forget to test correctnessAlways verify outputs after changes

✅ 11. Performance Optimization Checklist


🚀 Conclusion

Performance optimization is a balance between speed, clarity, and maintainability.
The best developers optimize intelligently — guided by measurement, not intuition.

“The fastest code is the code you don’t run.”

By understanding algorithms, data structures, caching, and profiling, you can craft Python applications that run efficiently — not only fast, but elegantly fast.