ch9s2_SeriesAndDataFrames

Pandas provides two core data structures that make it a powerhouse for data analysis:

Chapter 9: Data Analysis with Pandas

Sub-Chapter: Series and DataFrames — Core Data Structures of Pandas

Pandas provides two core data structures that make it a powerhouse for data analysis:
Series (1D) and DataFrame (2D).
These are built on top of NumPy arrays, combining fast vectorized operations with flexible labeling and alignment features.

🧩 1. Understanding Series and DataFrames

Structure	Dimensionality	Analogy	Example Use
Series	1D	A single column	Student scores, stock prices
DataFrame	2D	A table / spreadsheet	CSV or SQL table

Both structures support labels for indexing rows (and columns for DataFrames), making data operations intuitive and powerful.

📊 2. Series — One-Dimensional Labeled Data

A Series represents a one-dimensional labeled array that can hold any data type — integers, strings, floats, or objects.

Creating a Series

import pandas as pd

# From list
data = [10, 20, 30, 40, 50]
labels = ["A", "B", "C", "D", "E"]
series = pd.Series(data, index=labels)

print(series)

Output:

A    10
B    20
C    30
D    40
E    50
dtype: int64

From Dictionary

data = {"Math": 85, "Science": 90, "English": 78}
marks = pd.Series(data)

From Scalar

constant = pd.Series(5, index=["x", "y", "z"])

🧠 3. Accessing and Modifying Series Data

# Access by label or position
print(series["B"])    # 20
print(series[2])      # 30

# Slicing
print(series["B":"D"])  # Labels inclusive → B, C, D

# Add / Update values
series["F"] = 60

# Apply vectorized operation
doubled = series * 2

# Apply a custom function
squared = series.apply(lambda x: x ** 2)

Series Attributes

Attribute	Description	Example
`.index`	Row labels	`series.index`
`.values`	Data as NumPy array	`series.values`
`.dtype`	Data type	`series.dtype`
`.name`	Optional label	`series.name = "Scores"`

🧱 4. DataFrame — Two-Dimensional Labeled Data

A DataFrame is a table-like structure with rows and columns. Each column is a Series.

data = {
    "Name": ["Alice", "Bob", "Charlie", "Diana"],
    "Age": [25, 30, 28, 22],
    "Score": [88, 92, 79, 95]
}
df = pd.DataFrame(data)
print(df)

Output:

      Name  Age  Score
0    Alice   25     88
1      Bob   30     92
2  Charlie   28     79
3    Diana   22     95

🔍 5. Accessing Data in DataFrames

Access Columns

df["Name"]       # Returns a Series
df[["Name", "Age"]]  # Returns subset of columns

Access Rows

df.loc[1]        # Label-based → 2nd row
df.iloc[2]       # Integer position → 3rd row

Access Specific Value

df.loc[1, "Age"]    # 30
df.iloc[2, 1]       # 28

Slicing

df.loc[1:3, ["Name", "Score"]]   # Rows 1–3, specific columns
df.iloc[:, 1:]                   # All rows, all columns except first

⚙️ 6. DataFrame Attributes

Attribute	Description	Example
`.shape`	(rows, columns)	`(4, 3)`
`.columns`	Column labels	`df.columns`
`.index`	Row labels	`df.index`
`.values`	Underlying NumPy array	`df.values`
`.dtypes`	Data types of columns	`df.dtypes`
`.T`	Transpose of DataFrame	`df.T`

🧮 7. Manipulating DataFrames

Add or Modify Columns

df["Gender"] = ["F", "M", "M", "F"]
df["Passed"] = df["Score"] > 80

Rename Columns

df.rename(columns={"Score": "ExamScore"}, inplace=True)

Drop Columns or Rows

df.drop("Gender", axis=1, inplace=True)  # Remove column
df.drop(0, axis=0, inplace=True)         # Remove first row

Apply Functions

df["AgeSquared"] = df["Age"].apply(lambda x: x ** 2)

📈 8. Vectorized Operations

Pandas operations are vectorized, meaning they apply across columns/rows without explicit loops.

df["AgePlus10"] = df["Age"] + 10
df["ScoreNormalized"] = df["Score"] / df["Score"].max()

🧠 Vectorization = performance. Avoid for loops when working with Series or DataFrames.

🔎 9. Filtering and Conditional Selection

# Simple filter
adults = df[df["Age"] >= 25]

# Multiple conditions
top_students = df[(df["Score"] > 85) & (df["Age"] < 30)]

🧠 10. Real-World Example — Employee Data

employees = pd.DataFrame({
    "Name": ["Ali", "Sara", "Reza", "Lina", "Omid"],
    "Department": ["HR", "IT", "Finance", "IT", "HR"],
    "Salary": [4800, 6200, 5800, 6700, 5200],
    "Experience": [2, 5, 3, 6, 4]
})

# Add performance bonus (10% of salary)
employees["Bonus"] = employees["Salary"] * 0.1

# Filter IT employees
it_team = employees[employees["Department"] == "IT"]

# Average salary per department
avg_salary = employees.groupby("Department")["Salary"].mean()

print(employees)
print(avg_salary)

Output (summarized):

   Name Department  Salary  Experience  Bonus
0   Ali        HR    4800           2   480.0
1  Sara        IT    6200           5   620.0
2  Reza   Finance    5800           3   580.0
3  Lina        IT    6700           6   670.0
4  Omid        HR    5200           4   520.0

Department
Finance    5800.0
HR         5000.0
IT         6450.0
Name: Salary, dtype: float64

🧾 11. Series vs DataFrame — Detailed Comparison

Feature	Series	DataFrame
Dimensionality	1D	2D
Structure	Values + Index	Rows + Columns
Access	Single label	Row and column labels
Returned by column selection	✅	❌ (columns only)
Vectorized operations	✅	✅
Creation	`pd.Series()`	`pd.DataFrame()`
Analogy	Single Excel column	Full Excel sheet

🧭 12. Best Practices

✅ Always inspect .info() and .describe() before analysis.
✅ Use vectorized operations instead of loops.
✅ Use .copy() when modifying filtered DataFrames.
✅ Keep column names consistent (avoid spaces).
✅ Combine .loc[] and .iloc[] properly — never mix them in the same query.

🧠 Summary

Concept	Description	Example
Series	1D labeled data	`pd.Series([1,2,3], index=['A','B','C'])`
DataFrame	2D labeled data	`pd.DataFrame({...})`
Access	Rows & columns	`df.loc[1, 'Age']`
Vectorized	Fast operations	`df['Age'] * 2`
Filter	Conditional selection	`df[df['Age']>25]`

Series and DataFrames are the foundation of Pandas — once you master them, the entire ecosystem of Python data analysis opens up effortlessly.