CSV and JSON Handling — Managing Structured Data

Working with structured data is one of the most common tasks in programming.

Chapter 4: File Handling

Sub-chapter: CSV and JSON Handling — Managing Structured Data

Working with structured data is one of the most common tasks in programming.
Two of the most widely used file formats for storing and exchanging structured data are CSV (Comma-Separated Values) and JSON (JavaScript Object Notation).

Python provides built-in modules — csv and json — that make it simple to read, write, and convert data between these formats.

📊 CSV — Tabular Data Format

A CSV file is a plain text file that stores data in a table-like structure, where each row represents a record and columns are separated by commas.

Example `data.csv`

name,age,city
Alice,30,New York
Bob,25,London
Charlie,35,Paris

Reading CSV Files

import csv

with open("data.csv", "r") as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

Using `DictReader` for Column Names

csv.DictReader automatically maps each row to a dictionary using column headers as keys.

with open("data.csv", "r") as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(row["name"], row["city"])

Writing CSV Files

import csv

data = [
    ["name", "age", "city"],
    ["Alice", 30, "New York"],
    ["Bob", 25, "London"]
]

with open("output.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerows(data)

Using `DictWriter` for Better Control

import csv

people = [
    {"name": "Alice", "age": 30, "city": "New York"},
    {"name": "Bob", "age": 25, "city": "London"}
]

with open("people.csv", "w", newline="") as file:
    fields = ["name", "age", "city"]
    writer = csv.DictWriter(file, fieldnames=fields)
    writer.writeheader()
    writer.writerows(people)

🧱 JSON — Hierarchical Data Format

JSON represents data as nested key-value pairs (like Python dictionaries).
It is commonly used in web APIs and configuration files.

Example `data.json`

{
  "name": "Alice",
  "age": 30,
  "city": "New York",
  "skills": ["Python", "AI"]
}

Reading JSON

import json

with open("data.json", "r") as file:
    data = json.load(file)
print(data["name"], data["skills"])

Writing JSON

import json

data = {
    "name": "Bob",
    "age": 25,
    "city": "London"
}

with open("output.json", "w") as file:
    json.dump(data, file, indent=4)

⚙️ Converting Between CSV and JSON

CSV ➜ JSON

import csv, json

students = []
with open("students.csv", "r") as csv_file:
    reader = csv.DictReader(csv_file)
    for row in reader:
        students.append(row)

with open("students.json", "w") as json_file:
    json.dump(students, json_file, indent=4)

JSON ➜ CSV

import csv, json

with open("students.json", "r") as json_file:
    data = json.load(json_file)

with open("students_converted.csv", "w", newline="") as csv_file:
    writer = csv.DictWriter(csv_file, fieldnames=data[0].keys())
    writer.writeheader()
    writer.writerows(data)

💾 Handling Encoding and Errors

When working with multilingual or large datasets, handle encoding explicitly.

with open("data.csv", "r", encoding="utf-8") as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

Handle malformed data safely:

try:
    with open("corrupted.json", "r") as file:
        data = json.load(file)
except json.JSONDecodeError as e:
    print("Error reading JSON:", e)

🧩 Real-World Example — Customer Sales Data Integration

Imagine you have a sales.csv file and a customers.json file.
You want to merge both to generate a report.json that combines customer details with their total purchase amount.

sales.csv

customer_id,amount
1,250
2,300
1,150
3,400

customers.json

[
  {"id": 1, "name": "Alice", "city": "New York"},
  {"id": 2, "name": "Bob", "city": "London"},
  {"id": 3, "name": "Charlie", "city": "Paris"}
]

Python Integration Script

import csv, json
from collections import defaultdict

# Step 1: Load sales data
sales_totals = defaultdict(int)
with open("sales.csv", "r") as sales_file:
    reader = csv.DictReader(sales_file)
    for row in reader:
        sales_totals[int(row["customer_id"])] += int(row["amount"])

# Step 2: Load customer data
with open("customers.json", "r") as customers_file:
    customers = json.load(customers_file)

# Step 3: Merge and enrich data
report = []
for customer in customers:
    total = sales_totals.get(customer["id"], 0)
    report.append({
        "id": customer["id"],
        "name": customer["name"],
        "city": customer["city"],
        "total_sales": total
    })

# Step 4: Save the merged report
with open("report.json", "w") as output_file:
    json.dump(report, output_file, indent=4)

print("✅ Report generated successfully!")

Result (`report.json`)

[
  {"id": 1, "name": "Alice", "city": "New York", "total_sales": 400},
  {"id": 2, "name": "Bob", "city": "London", "total_sales": 300},
  {"id": 3, "name": "Charlie", "city": "Paris", "total_sales": 400}
]

📈 Using `pandas` for Advanced CSV/JSON Operations

For larger datasets or data analysis, pandas provides powerful read/write capabilities.

import pandas as pd

# Read CSV and JSON
csv_data = pd.read_csv("data.csv")
json_data = pd.read_json("data.json")

# Convert and save
csv_data.to_json("converted.json", orient="records", indent=4)

🧾 Best Practices

✅ Always use with open() for safe file operations.
✅ Use DictReader / DictWriter for structured CSVs.
✅ Use indent=4 for human-readable JSON.
✅ Handle encodings (utf-8) explicitly.
✅ Validate JSON with try/except JSONDecodeError.
✅ For large-scale datasets — use pandas for performance.

🧠 Summary

CSV — Ideal for tabular, spreadsheet-style data.
JSON — Ideal for hierarchical, nested data.
Python’s csv and json modules make it easy to interchange data.
Combine both formats to build powerful data pipelines.
Real-world uses: analytics, APIs, logs, configurations, and ETL processes.

By mastering CSV and JSON handling, you can integrate Python seamlessly with databases, web APIs, and modern data systems — bridging the gap between raw files and structured information.