CSV and JSON Handling — Managing Structured Data
Working with structured data is one of the most common tasks in programming.
Chapter 4: File Handling
Sub-chapter: CSV and JSON Handling — Managing Structured Data
Working with structured data is one of the most common tasks in programming.
Two of the most widely used file formats for storing and exchanging structured data are CSV (Comma-Separated Values) and JSON (JavaScript Object Notation).
Python provides built-in modules — csv and json — that make it simple to read, write, and convert data between these formats.
📊 CSV — Tabular Data Format
A CSV file is a plain text file that stores data in a table-like structure, where each row represents a record and columns are separated by commas.
Example data.csv
name,age,city
Alice,30,New York
Bob,25,London
Charlie,35,Paris
Reading CSV Files
import csv
with open("data.csv", "r") as file:
reader = csv.reader(file)
for row in reader:
print(row)
Using DictReader for Column Names
csv.DictReader automatically maps each row to a dictionary using column headers as keys.
with open("data.csv", "r") as file:
reader = csv.DictReader(file)
for row in reader:
print(row["name"], row["city"])
Writing CSV Files
import csv
data = [
["name", "age", "city"],
["Alice", 30, "New York"],
["Bob", 25, "London"]
]
with open("output.csv", "w", newline="") as file:
writer = csv.writer(file)
writer.writerows(data)
Using DictWriter for Better Control
import csv
people = [
{"name": "Alice", "age": 30, "city": "New York"},
{"name": "Bob", "age": 25, "city": "London"}
]
with open("people.csv", "w", newline="") as file:
fields = ["name", "age", "city"]
writer = csv.DictWriter(file, fieldnames=fields)
writer.writeheader()
writer.writerows(people)
🧱 JSON — Hierarchical Data Format
JSON represents data as nested key-value pairs (like Python dictionaries).
It is commonly used in web APIs and configuration files.
Example data.json
{
"name": "Alice",
"age": 30,
"city": "New York",
"skills": ["Python", "AI"]
}
Reading JSON
import json
with open("data.json", "r") as file:
data = json.load(file)
print(data["name"], data["skills"])
Writing JSON
import json
data = {
"name": "Bob",
"age": 25,
"city": "London"
}
with open("output.json", "w") as file:
json.dump(data, file, indent=4)
⚙️ Converting Between CSV and JSON
CSV ➜ JSON
import csv, json
students = []
with open("students.csv", "r") as csv_file:
reader = csv.DictReader(csv_file)
for row in reader:
students.append(row)
with open("students.json", "w") as json_file:
json.dump(students, json_file, indent=4)
JSON ➜ CSV
import csv, json
with open("students.json", "r") as json_file:
data = json.load(json_file)
with open("students_converted.csv", "w", newline="") as csv_file:
writer = csv.DictWriter(csv_file, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)
💾 Handling Encoding and Errors
When working with multilingual or large datasets, handle encoding explicitly.
with open("data.csv", "r", encoding="utf-8") as file:
reader = csv.reader(file)
for row in reader:
print(row)
Handle malformed data safely:
try:
with open("corrupted.json", "r") as file:
data = json.load(file)
except json.JSONDecodeError as e:
print("Error reading JSON:", e)
🧩 Real-World Example — Customer Sales Data Integration
Imagine you have a sales.csv file and a customers.json file.
You want to merge both to generate a report.json that combines customer details with their total purchase amount.
sales.csv
customer_id,amount
1,250
2,300
1,150
3,400
customers.json
[
{"id": 1, "name": "Alice", "city": "New York"},
{"id": 2, "name": "Bob", "city": "London"},
{"id": 3, "name": "Charlie", "city": "Paris"}
]
Python Integration Script
import csv, json
from collections import defaultdict
# Step 1: Load sales data
sales_totals = defaultdict(int)
with open("sales.csv", "r") as sales_file:
reader = csv.DictReader(sales_file)
for row in reader:
sales_totals[int(row["customer_id"])] += int(row["amount"])
# Step 2: Load customer data
with open("customers.json", "r") as customers_file:
customers = json.load(customers_file)
# Step 3: Merge and enrich data
report = []
for customer in customers:
total = sales_totals.get(customer["id"], 0)
report.append({
"id": customer["id"],
"name": customer["name"],
"city": customer["city"],
"total_sales": total
})
# Step 4: Save the merged report
with open("report.json", "w") as output_file:
json.dump(report, output_file, indent=4)
print("✅ Report generated successfully!")
Result (report.json)
[
{"id": 1, "name": "Alice", "city": "New York", "total_sales": 400},
{"id": 2, "name": "Bob", "city": "London", "total_sales": 300},
{"id": 3, "name": "Charlie", "city": "Paris", "total_sales": 400}
]
📈 Using pandas for Advanced CSV/JSON Operations
For larger datasets or data analysis, pandas provides powerful read/write capabilities.
import pandas as pd
# Read CSV and JSON
csv_data = pd.read_csv("data.csv")
json_data = pd.read_json("data.json")
# Convert and save
csv_data.to_json("converted.json", orient="records", indent=4)
🧾 Best Practices
✅ Always use with open() for safe file operations.
✅ Use DictReader / DictWriter for structured CSVs.
✅ Use indent=4 for human-readable JSON.
✅ Handle encodings (utf-8) explicitly.
✅ Validate JSON with try/except JSONDecodeError.
✅ For large-scale datasets — use pandas for performance.
🧠 Summary
- CSV — Ideal for tabular, spreadsheet-style data.
- JSON — Ideal for hierarchical, nested data.
- Python’s
csvandjsonmodules make it easy to interchange data. - Combine both formats to build powerful data pipelines.
- Real-world uses: analytics, APIs, logs, configurations, and ETL processes.
By mastering CSV and JSON handling, you can integrate Python seamlessly with databases, web APIs, and modern data systems — bridging the gap between raw files and structured information.