ch9s2_SeriesAndDataFrames
Pandas provides two core data structures that make it a powerhouse for data analysis:
Chapter 9: Data Analysis with Pandas
Sub-Chapter: Series and DataFrames โ Core Data Structures of Pandas
Pandas provides two core data structures that make it a powerhouse for data analysis:
Series (1D) and DataFrame (2D).
These are built on top of NumPy arrays, combining fast vectorized operations with flexible labeling and alignment features.
๐งฉ 1. Understanding Series and DataFrames
| Structure | Dimensionality | Analogy | Example Use |
|---|---|---|---|
| Series | 1D | A single column | Student scores, stock prices |
| DataFrame | 2D | A table / spreadsheet | CSV or SQL table |
Both structures support labels for indexing rows (and columns for DataFrames), making data operations intuitive and powerful.
๐ 2. Series โ One-Dimensional Labeled Data
A Series represents a one-dimensional labeled array that can hold any data type โ integers, strings, floats, or objects.
Creating a Series
import pandas as pd
# From list
data = [10, 20, 30, 40, 50]
labels = ["A", "B", "C", "D", "E"]
series = pd.Series(data, index=labels)
print(series)
Output:
A 10
B 20
C 30
D 40
E 50
dtype: int64
From Dictionary
data = {"Math": 85, "Science": 90, "English": 78}
marks = pd.Series(data)
From Scalar
constant = pd.Series(5, index=["x", "y", "z"])
๐ง 3. Accessing and Modifying Series Data
# Access by label or position
print(series["B"]) # 20
print(series[2]) # 30
# Slicing
print(series["B":"D"]) # Labels inclusive โ B, C, D
# Add / Update values
series["F"] = 60
# Apply vectorized operation
doubled = series * 2
# Apply a custom function
squared = series.apply(lambda x: x ** 2)
Series Attributes
| Attribute | Description | Example |
|---|---|---|
.index | Row labels | series.index |
.values | Data as NumPy array | series.values |
.dtype | Data type | series.dtype |
.name | Optional label | series.name = "Scores" |
๐งฑ 4. DataFrame โ Two-Dimensional Labeled Data
A DataFrame is a table-like structure with rows and columns. Each column is a Series.
data = {
"Name": ["Alice", "Bob", "Charlie", "Diana"],
"Age": [25, 30, 28, 22],
"Score": [88, 92, 79, 95]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age Score
0 Alice 25 88
1 Bob 30 92
2 Charlie 28 79
3 Diana 22 95
๐ 5. Accessing Data in DataFrames
Access Columns
df["Name"] # Returns a Series
df[["Name", "Age"]] # Returns subset of columns
Access Rows
df.loc[1] # Label-based โ 2nd row
df.iloc[2] # Integer position โ 3rd row
Access Specific Value
df.loc[1, "Age"] # 30
df.iloc[2, 1] # 28
Slicing
df.loc[1:3, ["Name", "Score"]] # Rows 1โ3, specific columns
df.iloc[:, 1:] # All rows, all columns except first
โ๏ธ 6. DataFrame Attributes
| Attribute | Description | Example |
|---|---|---|
.shape | (rows, columns) | (4, 3) |
.columns | Column labels | df.columns |
.index | Row labels | df.index |
.values | Underlying NumPy array | df.values |
.dtypes | Data types of columns | df.dtypes |
.T | Transpose of DataFrame | df.T |
๐งฎ 7. Manipulating DataFrames
Add or Modify Columns
df["Gender"] = ["F", "M", "M", "F"]
df["Passed"] = df["Score"] > 80
Rename Columns
df.rename(columns={"Score": "ExamScore"}, inplace=True)
Drop Columns or Rows
df.drop("Gender", axis=1, inplace=True) # Remove column
df.drop(0, axis=0, inplace=True) # Remove first row
Apply Functions
df["AgeSquared"] = df["Age"].apply(lambda x: x ** 2)
๐ 8. Vectorized Operations
Pandas operations are vectorized, meaning they apply across columns/rows without explicit loops.
df["AgePlus10"] = df["Age"] + 10
df["ScoreNormalized"] = df["Score"] / df["Score"].max()
๐ง Vectorization = performance. Avoid
forloops when working with Series or DataFrames.
๐ 9. Filtering and Conditional Selection
# Simple filter
adults = df[df["Age"] >= 25]
# Multiple conditions
top_students = df[(df["Score"] > 85) & (df["Age"] < 30)]
๐ง 10. Real-World Example โ Employee Data
employees = pd.DataFrame({
"Name": ["Ali", "Sara", "Reza", "Lina", "Omid"],
"Department": ["HR", "IT", "Finance", "IT", "HR"],
"Salary": [4800, 6200, 5800, 6700, 5200],
"Experience": [2, 5, 3, 6, 4]
})
# Add performance bonus (10% of salary)
employees["Bonus"] = employees["Salary"] * 0.1
# Filter IT employees
it_team = employees[employees["Department"] == "IT"]
# Average salary per department
avg_salary = employees.groupby("Department")["Salary"].mean()
print(employees)
print(avg_salary)
Output (summarized):
Name Department Salary Experience Bonus
0 Ali HR 4800 2 480.0
1 Sara IT 6200 5 620.0
2 Reza Finance 5800 3 580.0
3 Lina IT 6700 6 670.0
4 Omid HR 5200 4 520.0
Department
Finance 5800.0
HR 5000.0
IT 6450.0
Name: Salary, dtype: float64
๐งพ 11. Series vs DataFrame โ Detailed Comparison
| Feature | Series | DataFrame |
|---|---|---|
| Dimensionality | 1D | 2D |
| Structure | Values + Index | Rows + Columns |
| Access | Single label | Row and column labels |
| Returned by column selection | โ | โ (columns only) |
| Vectorized operations | โ | โ |
| Creation | pd.Series() | pd.DataFrame() |
| Analogy | Single Excel column | Full Excel sheet |
๐งญ 12. Best Practices
โ
Always inspect .info() and .describe() before analysis.
โ
Use vectorized operations instead of loops.
โ
Use .copy() when modifying filtered DataFrames.
โ
Keep column names consistent (avoid spaces).
โ
Combine .loc[] and .iloc[] properly โ never mix them in the same query.
๐ง Summary
| Concept | Description | Example |
|---|---|---|
| Series | 1D labeled data | pd.Series([1,2,3], index=['A','B','C']) |
| DataFrame | 2D labeled data | pd.DataFrame({...}) |
| Access | Rows & columns | df.loc[1, 'Age'] |
| Vectorized | Fast operations | df['Age'] * 2 |
| Filter | Conditional selection | df[df['Age']>25] |
Series and DataFrames are the foundation of Pandas โ once you master them, the entire ecosystem of Python data analysis opens up effortlessly.