Topic 5: What is pandas?

 What is Pandas?


Pandas is an open-source Python library designed for data manipulation, analysis, and cleaning.
It offers powerful data structures such as Series for 1-dimensional data and DataFrame for 2-dimensional tabular data.
Built on top of NumPy, Pandas ensures fast and efficient performance for large datasets. It simplifies tasks like data filtering, transformation, aggregation, and visualization. Pandas supports reading and writing data from multiple sources, including CSV, Excel, and SQL. Its intuitive syntax allows users to handle missing data and perform complex operations easily. Overall, Pandas is an essential tool for data science, analytics, and machine learning workflows.

Diagram: Pandas Workflow

Need of Pandas?

Without Pandas, working with large datasets in plain Python is difficult.

Pandas helps by providing: 1. Efficient data handling Load, modify, and analyze data easily.

2. Flexible data structures – Handle missing values, duplicates, etc.

3. Integration – Works well with NumPy, Matplotlib, Scikit-learn, etc.

4. Data Import/Export– Easily read/write data in CSV, Excel, JSON, SQL, etc.

5. Faster data processing– Optimized for performance.


Core Data Structures in Pandas: 

There are two type of data structure in pandas such as Series for 1-dimensional data and DataFrame for 2-dimensional tabular data. 
Core Data Structure in Pandas


Common Pandas Inbuilt Functions with Examples


1️⃣Creating Series and DataFrame

python

import pandas as pd

Series

s = pd.Series([10, 20, 30, 40])

print(s)


 DataFrame

data = {'Name': ['Alice', 'Bob', 'Charlie'],

        'Age': [25, 30, 35],

        'City': ['Delhi', 'Mumbai', 'Chennai']}

df = pd.DataFrame(data)

print(df)

2️⃣ head() and tail()

 Display first rows and last few rows.

python

print(df.head(2))   # First 2 rows

print(df.tail(1))   # Last 1 row

3️⃣ info()

 Shows basic information about the DataFrame.

python

df.info()

Output:

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 3 entries, 0 to 2

Data columns (total 3 columns):

4️⃣ describe()

 Provides statistical summary of numeric columns.

python

print(df.describe())

Output:

|            | Age  |

| -----    | ---- |

| count  | 3.0  |

| mean  | 30.0 |

| min    | 25.0 |

| max    | 35.0 |


5️⃣ shape

 Returns (rows, columns) of DataFrame.

python

print(df.shape)

 Output: (3, 3)

6️⃣columns and index

Get column names and row index.

python

print(df.columns)

print(df.index)

7️⃣ sort_values()

 Sort data based on column values.

python

print(df.sort_values(by='Age', ascending=False))

8️⃣ iloc[] and loc[]

 Access specific rows and columns.

python

print(df.iloc[0])       # First row (by index position)

print(df.loc[1, 'City'])  # Value at row 1 and column ‘City’

9️⃣ isnull() and dropna()

Handle missing data.

python

df.isnull()     # Check for missing values

df.dropna()     # Remove rows with missing values

🔟fillna()

 Fill missing values with specified data.

python

df.fillna(value=0)


1️⃣1️⃣ groupby()

Group data by certain columns and apply functions.

python

group = df.groupby('City')['Age'].mean()

print(group)

1️⃣2️⃣ merge(), concat(), join()

 Combine multiple DataFrames.

python

df1 = pd.DataFrame({'ID':[1,2], 'Name':['A','B']})

df2 = pd.DataFrame({'ID':[1,2], 'Salary':[50000,60000]})

result = pd.merge(df1, df2, on='ID')

print(result)

1️⃣3️⃣ read_csv() & to_csv()

 Import/export data easily.

python

df = pd.read_csv('data.csv')

df.to_csv('output.csv', index=False)

1️⃣4️⃣ value_counts()

 Count frequency of unique values in a column.

python

print(df['City'].value_counts())

Pandas Quiz

Summary

| Function                                         | Purpose                 |

| ----------------------------------          | ----------------------- |

| `head()` / `tail()`                             | Show first/last rows    |

| `info()`                                            | Data info summary       |

| `describe()`                                     | Statistics summary      |

| `shape`                                            | Dimensions of DataFrame |

| `sort_values()`                                | Sort data               |

| `isnull()`, `dropna()`, `fillna()`       | Handle missing data     |

| `groupby()`                                     | Aggregate data          |

| `merge()`, `concat()`                      | Combine DataFrames      |

| `read_csv()`, `to_csv()`                  | Data import/export      |


Pandas Quiz

Comments

Popular posts from this blog

Topic1:- Introduction of Machine Learning

Topic3:- python's Important Libraries for Machine Learning

Topic 2:- Machine Learning Workflow