File Handling

Published: November 13, 2025 • Language: python • Chapter: 4 • Sub: 3 • Level: beginner

python

Chapter 4: File Handling

Sub-chapter: Working with Binary Files — Handling Non-Text Data

Binary files store data in raw byte form — unlike text files, which use human-readable characters.
They are essential for working with images, audio, videos, executables, and serialized objects.
Python provides a simple and safe way to work with binary data using file modes 'rb' and 'wb'.


⚙️ Text Files vs Binary Files

Type Mode Data Type Example Use
Text 'r', 'w', 'a' str (characters) Logs, configs, JSON, CSV
Binary 'rb', 'wb', 'ab' bytes (raw data) Images, executables, media

Binary data is not encoded in human-readable form — reading it as text will result in unreadable symbols.


📥 Reading Binary Files

# Read binary data from a file
with open("binary_data.dat", "rb") as file:
    binary_data = file.read()

print(type(binary_data))  # <class 'bytes'>
print(binary_data[:20])   # Preview first 20 bytes

Reading Large Files in Chunks

When working with large files, it’s efficient to read data in chunks:

with open("video.mp4", "rb") as file:
    while chunk := file.read(1024 * 1024):  # 1 MB at a time
        process_chunk(chunk)

⚡ Chunked reading helps avoid memory overload for large media or binary data.


💾 Writing Binary Files

# Binary data representing 'Hello'
binary_data = b"\x48\x65\x6C\x6C\x6F"  # "Hello" in ASCII

with open("output.dat", "wb") as file:
    file.write(binary_data)

Copying Binary Files (e.g., Images)

# Copy an image file byte-by-byte
with open("source.png", "rb") as src, open("copy.png", "wb") as dst:
    dst.write(src.read())
print("✅ Image copied successfully!")

🧱 Working with Structured Binary Data (struct Module)

The struct module allows you to pack and unpack numbers and strings into bytes — useful when interacting with binary protocols or hardware.

import struct

# Pack integers into bytes
data = struct.pack("iif", 42, 10, 3.14)
print(data)

# Unpack bytes back into numbers
numbers = struct.unpack("iif", data)
print(numbers)

Format string "iif" = 2 integers + 1 float.
Useful for working with C-style binary formats and network protocols.


🧠 Serializing Python Objects with pickle

The pickle module converts Python objects into a binary byte stream so they can be saved and reloaded later.

Saving (Pickling)

import pickle

data = {"name": "Alice", "age": 30, "skills": ["Python", "AI"]}

with open("data.pkl", "wb") as file:
    pickle.dump(data, file)

print("✅ Data serialized successfully!")

Loading (Unpickling)

import pickle

with open("data.pkl", "rb") as file:
    loaded_data = pickle.load(file)

print(loaded_data)

⚠️ Security Warning: Only unpickle files from trusted sources.
Pickle can execute arbitrary code during loading.


🧩 Real-World Example — Image Copy and Verification Utility

Let’s create a mini-project that copies an image file in binary mode and verifies its integrity using file size and checksum comparison.

import hashlib

def copy_and_verify(source, destination):
    # Copy file in chunks
    with open(source, "rb") as src, open(destination, "wb") as dst:
        while chunk := src.read(8192):
            dst.write(chunk)

    # Compute file checksums
    def file_hash(path):
        hash_func = hashlib.sha256()
        with open(path, "rb") as f:
            for chunk in iter(lambda: f.read(4096), b""):
                hash_func.update(chunk)
        return hash_func.hexdigest()

    src_hash = file_hash(source)
    dst_hash = file_hash(destination)

    if src_hash == dst_hash:
        print("✅ File copied successfully and verified!")
    else:
        print("❌ File verification failed!")

# Example usage
copy_and_verify("photo.jpg", "photo_backup.jpg")

🧩 This approach ensures data integrity — commonly used in file transfer systems, backups, and checksum verification tools.


🧠 Handling Binary Exceptions

Binary data may be corrupted or incomplete. Always handle exceptions carefully.

try:
    with open("file.bin", "rb") as file:
        data = file.read()
except FileNotFoundError:
    print("Error: File not found!")
except IOError as e:
    print("I/O error:", e)

🧾 Best Practices

✅ Always open binary files with 'rb' or 'wb'.
✅ Use chunked reads for large files.
✅ Validate integrity with checksums or file size.
✅ Never unpickle untrusted files — use json for safe data exchange.
✅ Use struct for low-level byte packing and unpacking.
✅ Close files automatically using with open() context manager.


📚 Summary

  • Binary files store raw bytes, not text.
  • Use 'rb' and 'wb' for reading/writing binary safely.
  • The struct and pickle modules let you handle structured and object-based binary data.
  • Chunked I/O and checksums ensure reliability with large or sensitive files.
  • Binary file handling is critical for real-world applications like image processing, network communication, machine learning model storage, and backup systems.

By mastering binary file operations, you gain control over data at its lowest level — enabling you to work directly with the byte representation of any digital asset.