Working with Binary Files — Handling Non-Text Data
Binary files store data in **raw byte form** — unlike text files, which use human-readable characters.
Chapter 4: File Handling
Sub-chapter: Working with Binary Files — Handling Non-Text Data
Binary files store data in raw byte form — unlike text files, which use human-readable characters.
They are essential for working with images, audio, videos, executables, and serialized objects.
Python provides a simple and safe way to work with binary data using file modes 'rb' and 'wb'.
⚙️ Text Files vs Binary Files
| Type | Mode | Data Type | Example Use |
|---|---|---|---|
| Text | 'r', 'w', 'a' | str (characters) | Logs, configs, JSON, CSV |
| Binary | 'rb', 'wb', 'ab' | bytes (raw data) | Images, executables, media |
Binary data is not encoded in human-readable form — reading it as text will result in unreadable symbols.
📥 Reading Binary Files
# Read binary data from a file
with open("binary_data.dat", "rb") as file:
binary_data = file.read()
print(type(binary_data)) # <class 'bytes'>
print(binary_data[:20]) # Preview first 20 bytes
Reading Large Files in Chunks
When working with large files, it’s efficient to read data in chunks:
with open("video.mp4", "rb") as file:
while chunk := file.read(1024 * 1024): # 1 MB at a time
process_chunk(chunk)
⚡ Chunked reading helps avoid memory overload for large media or binary data.
💾 Writing Binary Files
# Binary data representing 'Hello'
binary_data = b"\x48\x65\x6C\x6C\x6F" # "Hello" in ASCII
with open("output.dat", "wb") as file:
file.write(binary_data)
Copying Binary Files (e.g., Images)
# Copy an image file byte-by-byte
with open("source.png", "rb") as src, open("copy.png", "wb") as dst:
dst.write(src.read())
print("✅ Image copied successfully!")
🧱 Working with Structured Binary Data (struct Module)
The struct module allows you to pack and unpack numbers and strings into bytes — useful when interacting with binary protocols or hardware.
import struct
# Pack integers into bytes
data = struct.pack("iif", 42, 10, 3.14)
print(data)
# Unpack bytes back into numbers
numbers = struct.unpack("iif", data)
print(numbers)
Format string
"iif"= 2 integers + 1 float.
Useful for working with C-style binary formats and network protocols.
🧠 Serializing Python Objects with pickle
The pickle module converts Python objects into a binary byte stream so they can be saved and reloaded later.
Saving (Pickling)
import pickle
data = {"name": "Alice", "age": 30, "skills": ["Python", "AI"]}
with open("data.pkl", "wb") as file:
pickle.dump(data, file)
print("✅ Data serialized successfully!")
Loading (Unpickling)
import pickle
with open("data.pkl", "rb") as file:
loaded_data = pickle.load(file)
print(loaded_data)
⚠️ Security Warning: Only unpickle files from trusted sources.
Pickle can execute arbitrary code during loading.
🧩 Real-World Example — Image Copy and Verification Utility
Let’s create a mini-project that copies an image file in binary mode and verifies its integrity using file size and checksum comparison.
import hashlib
def copy_and_verify(source, destination):
# Copy file in chunks
with open(source, "rb") as src, open(destination, "wb") as dst:
while chunk := src.read(8192):
dst.write(chunk)
# Compute file checksums
def file_hash(path):
hash_func = hashlib.sha256()
with open(path, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash_func.update(chunk)
return hash_func.hexdigest()
src_hash = file_hash(source)
dst_hash = file_hash(destination)
if src_hash == dst_hash:
print("✅ File copied successfully and verified!")
else:
print("❌ File verification failed!")
# Example usage
copy_and_verify("photo.jpg", "photo_backup.jpg")
🧩 This approach ensures data integrity — commonly used in file transfer systems, backups, and checksum verification tools.
🧠 Handling Binary Exceptions
Binary data may be corrupted or incomplete. Always handle exceptions carefully.
try:
with open("file.bin", "rb") as file:
data = file.read()
except FileNotFoundError:
print("Error: File not found!")
except IOError as e:
print("I/O error:", e)
🧾 Best Practices
✅ Always open binary files with 'rb' or 'wb'.
✅ Use chunked reads for large files.
✅ Validate integrity with checksums or file size.
✅ Never unpickle untrusted files — use json for safe data exchange.
✅ Use struct for low-level byte packing and unpacking.
✅ Close files automatically using with open() context manager.
📚 Summary
- Binary files store raw bytes, not text.
- Use
'rb'and'wb'for reading/writing binary safely. - The
structandpicklemodules let you handle structured and object-based binary data. - Chunked I/O and checksums ensure reliability with large or sensitive files.
- Binary file handling is critical for real-world applications like image processing, network communication, machine learning model storage, and backup systems.
By mastering binary file operations, you gain control over data at its lowest level — enabling you to work directly with the byte representation of any digital asset.