ch12s3_TimeSeriesAnalysisWithPandas

Pandas provides powerful features for working with **time series data** — datasets indexed by time.

Chapter 12: Advanced NumPy and Pandas — Time Series Analysis with Pandas

🕒 Time Series Analysis with Pandas

Pandas provides powerful features for working with time series data — datasets indexed by time.
From financial analytics to IoT monitoring and forecasting, mastering Pandas’ time series tools is essential for advanced data analysis.

📅 1. Working with Date and Time Data

Time series begin with proper date parsing and indexing.
You can convert strings to datetime objects using pd.to_datetime().

import pandas as pd

dates = ["2023-01-01", "2023-01-02", "2023-01-03"]
values = [10, 15, 20]

df = pd.DataFrame({"Date": dates, "Value": values})
df["Date"] = pd.to_datetime(df["Date"])
df.set_index("Date", inplace=True)

print(df)

🧠 Tip: Always use datetime indexes — it unlocks time-based indexing, resampling, and rolling operations.

🔍 2. Indexing and Slicing by Date

Once your DataFrame has a datetime index, you can select data by time periods.

# Select specific date
print(df.loc["2023-01-02"])

# Select date range
print(df.loc["2023-01-01":"2023-01-03"])

# Select by month or year
print(df["2023-01"])

📊 3. Creating a Time Series

You can easily create synthetic time series data using pd.date_range().

date_rng = pd.date_range(start="2023-01-01", end="2023-01-10", freq="D")
data = pd.Series(range(len(date_rng)), index=date_rng)
print(data.head())

Common Frequencies

Alias	Meaning	Example
`D`	Daily	2023-01-01, 2023-01-02
`H`	Hourly	2023-01-01 00:00
`M`	Month-end	2023-01-31
`MS`	Month-start	2023-01-01
`W`	Weekly	Sundays
`Q`	Quarterly	2023-03-31
`A`	Year-end	2023-12-31

🔁 4. Resampling and Frequency Conversion

Resampling allows you to change the frequency of your time series — for example, daily → monthly.

import numpy as np

date_rng = pd.date_range(start="2023-01-01", periods=100, freq="D")
df = pd.DataFrame({"Value": np.random.randint(0, 100, size=100)}, index=date_rng)

# Convert daily data to monthly means
monthly = df.resample("M").mean()

# Convert to weekly sum
weekly = df.resample("W").sum()

print("Monthly Average:\n", monthly.head())
print("\nWeekly Sum:\n", weekly.head())

Downsampling vs Upsampling

Type	Description	Example
Downsampling	Reduce frequency (daily → monthly)	`resample('M').mean()`
Upsampling	Increase frequency (monthly → daily)	`resample('D').ffill()`

⏩ 5. Shifting, Lagging, and Differencing

Use shifting to compare data points across time (lags, leads).

df["Lag_1"] = df["Value"].shift(1)     # Previous day
df["Diff"] = df["Value"].diff()        # Difference with previous day
print(df.head())

💡 Lagging and differencing are key for trend detection and stationarity testing in forecasting models.

📈 6. Rolling and Expanding Windows

Rolling operations compute statistics over moving windows — e.g., moving average.

df["RollingMean_7D"] = df["Value"].rolling(window=7).mean()
df["RollingStd_7D"] = df["Value"].rolling(window=7).std()

print(df.head(10))

Expanding windows compute cumulative stats:

df["CumulativeMean"] = df["Value"].expanding().mean()

🧩 7. Handling Missing or Irregular Time Data

You can fill missing timestamps or interpolate missing values easily.

# Reindex to full date range
full_range = pd.date_range(df.index.min(), df.index.max(), freq="D")
df = df.reindex(full_range)

# Fill missing values
df["Value"].fillna(method="ffill", inplace=True)  # Forward fill

Or interpolate smoothly:

df["Value"] = df["Value"].interpolate()

🕵️‍♂️ 8. Time Series Visualization

Visualization is crucial for spotting trends and seasonality.

import matplotlib.pyplot as plt

df["Value"].plot(label="Daily Data", figsize=(10, 4))
df["RollingMean_7D"].plot(label="7-Day Rolling Mean", linewidth=2)
plt.legend()
plt.title("Time Series Analysis Example")
plt.show()

Try combining resampling + rolling averages for clean trend visualization.

🧠 9. Advanced DateTime Features

Extract Date Components

df["Year"] = df.index.year
df["Month"] = df.index.month
df["Day"] = df.index.day

Filtering by Time of Day

# For hourly data
hourly = pd.Series(range(48), index=pd.date_range("2023-01-01", periods=48, freq="H"))
print(hourly.between_time("08:00", "18:00").head())

Time Zone Handling

df = df.tz_localize("UTC").tz_convert("Europe/Berlin")

📋 10. Common Time Series Methods in Pandas

Method	Purpose
`pd.to_datetime()`	Convert to datetime
`pd.date_range()`	Create date index
`.resample()`	Change frequency
`.shift()`, `.diff()`	Lag and difference
`.rolling()`, `.expanding()`	Moving and cumulative stats
`.asfreq()`	Change frequency without aggregation
`.ffill()` / `.bfill()`	Fill missing data
`.tz_localize()` / `.tz_convert()`	Time zone control

🧭 Summary

Pandas makes time series analysis intuitive and powerful — from date indexing to resampling, rolling windows, and visualization.
With these tools, you can transform raw time-based data into meaningful insights and trends.

Once you master Pandas time series tools, you’re ready for forecasting with statsmodels, Prophet, or scikit-learn.