ch17s3_PerformanceOptimization
Performance optimization is about writing **efficient, scalable, and responsive** code.
Chapter 17: Best Practices and Tips — Performance Optimization
⚡ Introduction: The Art of Performance Optimization
Performance optimization is about writing efficient, scalable, and responsive code.
But remember: optimization without measurement is guessing. Always identify real bottlenecks before changing anything.
“Premature optimization is the root of all evil.” — Donald Knuth
Optimized code improves:
- Execution speed
- Memory usage
- Scalability under load
- User experience and responsiveness
🧭 1. The Performance Optimization Mindset
- Measure before you optimize — profile to find the real bottlenecks.
- Fix the biggest problem first — don’t waste time on micro-optimizations.
- Prefer clarity over cleverness — optimization should not reduce readability.
- Benchmark after every change — ensure your “fix” actually helps.
- Don’t optimize everything — focus on hot paths and critical code.
🔍 2. Profiling and Measurement Tools
Before you optimize, you must know where time and memory are being spent.
🕒 timeit: Quick Micro-Benchmarks
import timeit
print(timeit.timeit("sum(range(1000))", number=10000))
🧩 cProfile: Full Code Profiling
import cProfile
def slow_function():
total = 0
for i in range(10_000_000):
total += i
return total
cProfile.run('slow_function()')
📈 line_profiler and memory_profiler
pip install line_profiler memory_profiler
@profile
def process_data():
data = [x**2 for x in range(10_000)]
return sum(data)
python -m memory_profiler myscript.py
Use profiling to locate hotspots — not assumptions.
🧮 3. Algorithmic Optimization
The biggest performance wins come from better algorithms and data structures.
| Problem | Naive | Optimized |
|---|---|---|
| Search | Linear search O(n) | Binary search O(log n) |
| Membership test | List | Set / Dict (O(1) average) |
| Sorting | Manual loops | Built-in sorted() (Timsort) |
| Counting | Loops | collections.Counter() |
Example: Linear vs Binary Search
# ❌ Inefficient O(n)
def linear_search(array, target):
for i, value in enumerate(array):
if value == target:
return i
return -1
# ✅ Efficient O(log n)
def binary_search(sorted_array, target):
left, right = 0, len(sorted_array) - 1
while left <= right:
mid = (left + right) // 2
if sorted_array[mid] == target:
return mid
elif sorted_array[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
🧠 4. Data Structure Optimization
Choosing the right data structure can yield huge improvements.
| Task | Recommended Structure | Reason |
|---|---|---|
| Frequent lookups | set or dict | Constant-time access |
| Ordered data | list or deque | Fast iteration |
| Counting items | collections.Counter | Built-in tallying |
| Fixed-size queue | collections.deque(maxlen=N) | Efficient rotation |
| Large numeric data | numpy.array | Vectorized speed |
🧩 5. Caching and Memoization
Avoid recomputation for repeated inputs.
import functools
@functools.lru_cache(maxsize=128)
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
Pro Tip: Use lru_cache for recursive functions and functools.cache (Python 3.9+) for simple caching.
🧮 6. Vectorization with NumPy
Vectorized operations are much faster than loops in Python because they run in optimized C code.
import numpy as np
# ❌ Slow loop
data = [i * 2 for i in range(10_000_000)]
# ✅ Fast vectorized version
arr = np.arange(10_000_000)
result = arr * 2
Avoid Python loops for numerical tasks — use vectorization whenever possible.
🧵 7. Concurrency and Parallelism
For I/O-bound tasks (like API calls or file reads), use threads.
For CPU-bound tasks (like computation), use multiprocessing.
Threads (I/O-bound)
import threading
def download_file(url):
print(f"Downloading {url}")
urls = ["a", "b", "c"]
threads = [threading.Thread(target=download_file, args=(u,)) for u in urls]
for t in threads: t.start()
for t in threads: t.join()
Processes (CPU-bound)
from multiprocessing import Pool
def compute_square(n):
return n * n
with Pool(4) as pool:
results = pool.map(compute_square, range(10))
print(results)
Choose threads for waiting, processes for working.
🧠 8. Memory Optimization
Memory usage often limits scalability more than CPU.
Use Generators Instead of Lists
# ❌ Loads everything in memory
squares = [x**2 for x in range(10_000_000)]
# ✅ Lazy evaluation (no large memory footprint)
squares = (x**2 for x in range(10_000_000))
Delete Unused Objects
Use del obj and garbage collection when large objects are no longer needed.
import gc
del large_dataset
gc.collect()
Measure Memory Usage
import tracemalloc
tracemalloc.start()
# run heavy code
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1e6:.2f} MB; Peak: {peak / 1e6:.2f} MB")
tracemalloc.stop()
🧪 9. Example: Real Optimization Workflow
Let’s optimize a real-world snippet step-by-step:
# ❌ Original version (slow)
def slow_sum(n):
total = 0
for i in range(n):
total += i
return total
# ✅ Optimized version (fast)
def fast_sum(n):
return n * (n - 1) // 2
Performance Comparison
import timeit
print("Slow:", timeit.timeit("slow_sum(10_000_000)", globals=globals(), number=1))
print("Fast:", timeit.timeit("fast_sum(10_000_000)", globals=globals(), number=1))
Optimization is about finding smarter ways, not just “faster computers.”
⚠️ 10. Avoid Over‑Optimization
Optimization is powerful but dangerous when done prematurely.
Here’s how to stay safe:
| Don’t… | Instead… |
|---|---|
| Rewrite everything in C | Profile first — only optimize hot paths |
| Obsess over microseconds | Focus on algorithmic efficiency |
| Sacrifice readability for speed | Use clear, maintainable solutions |
| Guess performance bottlenecks | Measure with profilers |
| Forget to test correctness | Always verify outputs after changes |
✅ 11. Performance Optimization Checklist
- Profile your code before optimizing (
cProfile,timeit) - Focus on algorithmic complexity first
- Use efficient data structures (
set,dict,numpy) - Cache repeated computations (
lru_cache, memoization) - Vectorize numeric workloads (NumPy, Pandas)
- Parallelize CPU-bound tasks (multiprocessing)
- Stream large data with generators
- Free unused memory and monitor usage
- Test and benchmark after every change
- Document your optimizations and trade-offs
🚀 Conclusion
Performance optimization is a balance between speed, clarity, and maintainability.
The best developers optimize intelligently — guided by measurement, not intuition.
“The fastest code is the code you don’t run.”
By understanding algorithms, data structures, caching, and profiling, you can craft Python applications that run efficiently — not only fast, but elegantly fast.