ch9s2_SeriesAndDataFrames

Pandas provides two core data structures that make it a powerhouse for data analysis:

Chapter 9: Data Analysis with Pandas

Sub-Chapter: Series and DataFrames โ€” Core Data Structures of Pandas

Pandas provides two core data structures that make it a powerhouse for data analysis:
Series (1D) and DataFrame (2D).
These are built on top of NumPy arrays, combining fast vectorized operations with flexible labeling and alignment features.


๐Ÿงฉ 1. Understanding Series and DataFrames

StructureDimensionalityAnalogyExample Use
Series1DA single columnStudent scores, stock prices
DataFrame2DA table / spreadsheetCSV or SQL table

Both structures support labels for indexing rows (and columns for DataFrames), making data operations intuitive and powerful.


๐Ÿ“Š 2. Series โ€” One-Dimensional Labeled Data

A Series represents a one-dimensional labeled array that can hold any data type โ€” integers, strings, floats, or objects.

Creating a Series

import pandas as pd

# From list
data = [10, 20, 30, 40, 50]
labels = ["A", "B", "C", "D", "E"]
series = pd.Series(data, index=labels)

print(series)

Output:

A    10
B    20
C    30
D    40
E    50
dtype: int64

From Dictionary

data = {"Math": 85, "Science": 90, "English": 78}
marks = pd.Series(data)

From Scalar

constant = pd.Series(5, index=["x", "y", "z"])

๐Ÿง  3. Accessing and Modifying Series Data

# Access by label or position
print(series["B"])    # 20
print(series[2])      # 30

# Slicing
print(series["B":"D"])  # Labels inclusive โ†’ B, C, D

# Add / Update values
series["F"] = 60

# Apply vectorized operation
doubled = series * 2

# Apply a custom function
squared = series.apply(lambda x: x ** 2)

Series Attributes

AttributeDescriptionExample
.indexRow labelsseries.index
.valuesData as NumPy arrayseries.values
.dtypeData typeseries.dtype
.nameOptional labelseries.name = "Scores"

๐Ÿงฑ 4. DataFrame โ€” Two-Dimensional Labeled Data

A DataFrame is a table-like structure with rows and columns. Each column is a Series.

data = {
    "Name": ["Alice", "Bob", "Charlie", "Diana"],
    "Age": [25, 30, 28, 22],
    "Score": [88, 92, 79, 95]
}
df = pd.DataFrame(data)
print(df)

Output:

      Name  Age  Score
0    Alice   25     88
1      Bob   30     92
2  Charlie   28     79
3    Diana   22     95

๐Ÿ” 5. Accessing Data in DataFrames

Access Columns

df["Name"]       # Returns a Series
df[["Name", "Age"]]  # Returns subset of columns

Access Rows

df.loc[1]        # Label-based โ†’ 2nd row
df.iloc[2]       # Integer position โ†’ 3rd row

Access Specific Value

df.loc[1, "Age"]    # 30
df.iloc[2, 1]       # 28

Slicing

df.loc[1:3, ["Name", "Score"]]   # Rows 1โ€“3, specific columns
df.iloc[:, 1:]                   # All rows, all columns except first

โš™๏ธ 6. DataFrame Attributes

AttributeDescriptionExample
.shape(rows, columns)(4, 3)
.columnsColumn labelsdf.columns
.indexRow labelsdf.index
.valuesUnderlying NumPy arraydf.values
.dtypesData types of columnsdf.dtypes
.TTranspose of DataFramedf.T

๐Ÿงฎ 7. Manipulating DataFrames

Add or Modify Columns

df["Gender"] = ["F", "M", "M", "F"]
df["Passed"] = df["Score"] > 80

Rename Columns

df.rename(columns={"Score": "ExamScore"}, inplace=True)

Drop Columns or Rows

df.drop("Gender", axis=1, inplace=True)  # Remove column
df.drop(0, axis=0, inplace=True)         # Remove first row

Apply Functions

df["AgeSquared"] = df["Age"].apply(lambda x: x ** 2)

๐Ÿ“ˆ 8. Vectorized Operations

Pandas operations are vectorized, meaning they apply across columns/rows without explicit loops.

df["AgePlus10"] = df["Age"] + 10
df["ScoreNormalized"] = df["Score"] / df["Score"].max()

๐Ÿง  Vectorization = performance. Avoid for loops when working with Series or DataFrames.


๐Ÿ”Ž 9. Filtering and Conditional Selection

# Simple filter
adults = df[df["Age"] >= 25]

# Multiple conditions
top_students = df[(df["Score"] > 85) & (df["Age"] < 30)]

๐Ÿง  10. Real-World Example โ€” Employee Data

employees = pd.DataFrame({
    "Name": ["Ali", "Sara", "Reza", "Lina", "Omid"],
    "Department": ["HR", "IT", "Finance", "IT", "HR"],
    "Salary": [4800, 6200, 5800, 6700, 5200],
    "Experience": [2, 5, 3, 6, 4]
})

# Add performance bonus (10% of salary)
employees["Bonus"] = employees["Salary"] * 0.1

# Filter IT employees
it_team = employees[employees["Department"] == "IT"]

# Average salary per department
avg_salary = employees.groupby("Department")["Salary"].mean()

print(employees)
print(avg_salary)

Output (summarized):

   Name Department  Salary  Experience  Bonus
0   Ali        HR    4800           2   480.0
1  Sara        IT    6200           5   620.0
2  Reza   Finance    5800           3   580.0
3  Lina        IT    6700           6   670.0
4  Omid        HR    5200           4   520.0

Department
Finance    5800.0
HR         5000.0
IT         6450.0
Name: Salary, dtype: float64

๐Ÿงพ 11. Series vs DataFrame โ€” Detailed Comparison

FeatureSeriesDataFrame
Dimensionality1D2D
StructureValues + IndexRows + Columns
AccessSingle labelRow and column labels
Returned by column selectionโœ…โŒ (columns only)
Vectorized operationsโœ…โœ…
Creationpd.Series()pd.DataFrame()
AnalogySingle Excel columnFull Excel sheet

๐Ÿงญ 12. Best Practices

โœ… Always inspect .info() and .describe() before analysis.
โœ… Use vectorized operations instead of loops.
โœ… Use .copy() when modifying filtered DataFrames.
โœ… Keep column names consistent (avoid spaces).
โœ… Combine .loc[] and .iloc[] properly โ€” never mix them in the same query.


๐Ÿง  Summary

ConceptDescriptionExample
Series1D labeled datapd.Series([1,2,3], index=['A','B','C'])
DataFrame2D labeled datapd.DataFrame({...})
AccessRows & columnsdf.loc[1, 'Age']
VectorizedFast operationsdf['Age'] * 2
FilterConditional selectiondf[df['Age']>25]

Series and DataFrames are the foundation of Pandas โ€” once you master them, the entire ecosystem of Python data analysis opens up effortlessly.