Working with Binary Files — Handling Non-Text Data

Binary files store data in **raw byte form** — unlike text files, which use human-readable characters.

Chapter 4: File Handling

Sub-chapter: Working with Binary Files — Handling Non-Text Data

Binary files store data in raw byte form — unlike text files, which use human-readable characters.
They are essential for working with images, audio, videos, executables, and serialized objects.
Python provides a simple and safe way to work with binary data using file modes 'rb' and 'wb'.


⚙️ Text Files vs Binary Files

TypeModeData TypeExample Use
Text'r', 'w', 'a'str (characters)Logs, configs, JSON, CSV
Binary'rb', 'wb', 'ab'bytes (raw data)Images, executables, media

Binary data is not encoded in human-readable form — reading it as text will result in unreadable symbols.


📥 Reading Binary Files

# Read binary data from a file
with open("binary_data.dat", "rb") as file:
    binary_data = file.read()

print(type(binary_data))  # <class 'bytes'>
print(binary_data[:20])   # Preview first 20 bytes

Reading Large Files in Chunks

When working with large files, it’s efficient to read data in chunks:

with open("video.mp4", "rb") as file:
    while chunk := file.read(1024 * 1024):  # 1 MB at a time
        process_chunk(chunk)

⚡ Chunked reading helps avoid memory overload for large media or binary data.


💾 Writing Binary Files

# Binary data representing 'Hello'
binary_data = b"\x48\x65\x6C\x6C\x6F"  # "Hello" in ASCII

with open("output.dat", "wb") as file:
    file.write(binary_data)

Copying Binary Files (e.g., Images)

# Copy an image file byte-by-byte
with open("source.png", "rb") as src, open("copy.png", "wb") as dst:
    dst.write(src.read())
print("✅ Image copied successfully!")

🧱 Working with Structured Binary Data (struct Module)

The struct module allows you to pack and unpack numbers and strings into bytes — useful when interacting with binary protocols or hardware.

import struct

# Pack integers into bytes
data = struct.pack("iif", 42, 10, 3.14)
print(data)

# Unpack bytes back into numbers
numbers = struct.unpack("iif", data)
print(numbers)

Format string "iif" = 2 integers + 1 float.
Useful for working with C-style binary formats and network protocols.


🧠 Serializing Python Objects with pickle

The pickle module converts Python objects into a binary byte stream so they can be saved and reloaded later.

Saving (Pickling)

import pickle

data = {"name": "Alice", "age": 30, "skills": ["Python", "AI"]}

with open("data.pkl", "wb") as file:
    pickle.dump(data, file)

print("✅ Data serialized successfully!")

Loading (Unpickling)

import pickle

with open("data.pkl", "rb") as file:
    loaded_data = pickle.load(file)

print(loaded_data)

⚠️ Security Warning: Only unpickle files from trusted sources.
Pickle can execute arbitrary code during loading.


🧩 Real-World Example — Image Copy and Verification Utility

Let’s create a mini-project that copies an image file in binary mode and verifies its integrity using file size and checksum comparison.

import hashlib

def copy_and_verify(source, destination):
    # Copy file in chunks
    with open(source, "rb") as src, open(destination, "wb") as dst:
        while chunk := src.read(8192):
            dst.write(chunk)

    # Compute file checksums
    def file_hash(path):
        hash_func = hashlib.sha256()
        with open(path, "rb") as f:
            for chunk in iter(lambda: f.read(4096), b""):
                hash_func.update(chunk)
        return hash_func.hexdigest()

    src_hash = file_hash(source)
    dst_hash = file_hash(destination)

    if src_hash == dst_hash:
        print("✅ File copied successfully and verified!")
    else:
        print("❌ File verification failed!")

# Example usage
copy_and_verify("photo.jpg", "photo_backup.jpg")

🧩 This approach ensures data integrity — commonly used in file transfer systems, backups, and checksum verification tools.


🧠 Handling Binary Exceptions

Binary data may be corrupted or incomplete. Always handle exceptions carefully.

try:
    with open("file.bin", "rb") as file:
        data = file.read()
except FileNotFoundError:
    print("Error: File not found!")
except IOError as e:
    print("I/O error:", e)

🧾 Best Practices

✅ Always open binary files with 'rb' or 'wb'.
✅ Use chunked reads for large files.
✅ Validate integrity with checksums or file size.
✅ Never unpickle untrusted files — use json for safe data exchange.
✅ Use struct for low-level byte packing and unpacking.
✅ Close files automatically using with open() context manager.


📚 Summary


By mastering binary file operations, you gain control over data at its lowest level — enabling you to work directly with the byte representation of any digital asset.