ch12s3_TimeSeriesAnalysisWithPandas
Pandas provides powerful features for working with **time series data** — datasets indexed by time.
Chapter 12: Advanced NumPy and Pandas — Time Series Analysis with Pandas
🕒 Time Series Analysis with Pandas
Pandas provides powerful features for working with time series data — datasets indexed by time.
From financial analytics to IoT monitoring and forecasting, mastering Pandas’ time series tools is essential for advanced data analysis.
📅 1. Working with Date and Time Data
Time series begin with proper date parsing and indexing.
You can convert strings to datetime objects using pd.to_datetime().
import pandas as pd
dates = ["2023-01-01", "2023-01-02", "2023-01-03"]
values = [10, 15, 20]
df = pd.DataFrame({"Date": dates, "Value": values})
df["Date"] = pd.to_datetime(df["Date"])
df.set_index("Date", inplace=True)
print(df)
🧠 Tip: Always use datetime indexes — it unlocks time-based indexing, resampling, and rolling operations.
🔍 2. Indexing and Slicing by Date
Once your DataFrame has a datetime index, you can select data by time periods.
# Select specific date
print(df.loc["2023-01-02"])
# Select date range
print(df.loc["2023-01-01":"2023-01-03"])
# Select by month or year
print(df["2023-01"])
📊 3. Creating a Time Series
You can easily create synthetic time series data using pd.date_range().
date_rng = pd.date_range(start="2023-01-01", end="2023-01-10", freq="D")
data = pd.Series(range(len(date_rng)), index=date_rng)
print(data.head())
Common Frequencies
| Alias | Meaning | Example |
|---|---|---|
D | Daily | 2023-01-01, 2023-01-02 |
H | Hourly | 2023-01-01 00:00 |
M | Month-end | 2023-01-31 |
MS | Month-start | 2023-01-01 |
W | Weekly | Sundays |
Q | Quarterly | 2023-03-31 |
A | Year-end | 2023-12-31 |
🔁 4. Resampling and Frequency Conversion
Resampling allows you to change the frequency of your time series — for example, daily → monthly.
import numpy as np
date_rng = pd.date_range(start="2023-01-01", periods=100, freq="D")
df = pd.DataFrame({"Value": np.random.randint(0, 100, size=100)}, index=date_rng)
# Convert daily data to monthly means
monthly = df.resample("M").mean()
# Convert to weekly sum
weekly = df.resample("W").sum()
print("Monthly Average:\n", monthly.head())
print("\nWeekly Sum:\n", weekly.head())
Downsampling vs Upsampling
| Type | Description | Example |
|---|---|---|
| Downsampling | Reduce frequency (daily → monthly) | resample('M').mean() |
| Upsampling | Increase frequency (monthly → daily) | resample('D').ffill() |
⏩ 5. Shifting, Lagging, and Differencing
Use shifting to compare data points across time (lags, leads).
df["Lag_1"] = df["Value"].shift(1) # Previous day
df["Diff"] = df["Value"].diff() # Difference with previous day
print(df.head())
💡 Lagging and differencing are key for trend detection and stationarity testing in forecasting models.
📈 6. Rolling and Expanding Windows
Rolling operations compute statistics over moving windows — e.g., moving average.
df["RollingMean_7D"] = df["Value"].rolling(window=7).mean()
df["RollingStd_7D"] = df["Value"].rolling(window=7).std()
print(df.head(10))
Expanding windows compute cumulative stats:
df["CumulativeMean"] = df["Value"].expanding().mean()
🧩 7. Handling Missing or Irregular Time Data
You can fill missing timestamps or interpolate missing values easily.
# Reindex to full date range
full_range = pd.date_range(df.index.min(), df.index.max(), freq="D")
df = df.reindex(full_range)
# Fill missing values
df["Value"].fillna(method="ffill", inplace=True) # Forward fill
Or interpolate smoothly:
df["Value"] = df["Value"].interpolate()
🕵️♂️ 8. Time Series Visualization
Visualization is crucial for spotting trends and seasonality.
import matplotlib.pyplot as plt
df["Value"].plot(label="Daily Data", figsize=(10, 4))
df["RollingMean_7D"].plot(label="7-Day Rolling Mean", linewidth=2)
plt.legend()
plt.title("Time Series Analysis Example")
plt.show()
Try combining resampling + rolling averages for clean trend visualization.
🧠 9. Advanced DateTime Features
Extract Date Components
df["Year"] = df.index.year
df["Month"] = df.index.month
df["Day"] = df.index.day
Filtering by Time of Day
# For hourly data
hourly = pd.Series(range(48), index=pd.date_range("2023-01-01", periods=48, freq="H"))
print(hourly.between_time("08:00", "18:00").head())
Time Zone Handling
df = df.tz_localize("UTC").tz_convert("Europe/Berlin")
📋 10. Common Time Series Methods in Pandas
| Method | Purpose |
|---|---|
pd.to_datetime() | Convert to datetime |
pd.date_range() | Create date index |
.resample() | Change frequency |
.shift(), .diff() | Lag and difference |
.rolling(), .expanding() | Moving and cumulative stats |
.asfreq() | Change frequency without aggregation |
.ffill() / .bfill() | Fill missing data |
.tz_localize() / .tz_convert() | Time zone control |
🧭 Summary
Pandas makes time series analysis intuitive and powerful — from date indexing to resampling, rolling windows, and visualization.
With these tools, you can transform raw time-based data into meaningful insights and trends.
Once you master Pandas time series tools, you’re ready for forecasting with statsmodels, Prophet, or scikit-learn.