Topic 2:- Machine Learning Workflow
Workflow
The Machine Learning (ML) workflow is a step-by-step
process used to build, train, test, and evaluate an ML model.
It ensures that data is properly prepared, the model is
correctly trained, and its performance is accurately measured.
Main Stages of the ML Workflow
| ML Overview(step-by-step) |
1. Data Collection
- Gather
data from various sources such as CSV files, databases, APIs, sensors, or
online datasets.
- Example:
Collecting house price data (area, location, price).
2. Data Preprocessing
- Clean
and prepare data before training.
- Handle
missing values, outliers, and categorical encoding.
- Apply
feature scaling and normalization.
3. Train/Test Split
- Split
the dataset into two parts:
- Training
Set: Used to train the model (70–80% of data)
- Testing
Set: Used to test the model (20–30% of data)
Purpose:
To check how well the model performs on unseen data (generalization).
4. Model Training
- Choose
an algorithm (e.g., Linear Regression, Decision Tree).
- Feed
the training data to the model to help it learn patterns.
5. Model Testing / Prediction
- Use
the test data to check model predictions.
6. Evaluation Metrics
Used to measure how good your model is.
Metrics depend on the type of ML problem.
For Regression Problems:
|
Metric |
Description |
Function |
|
MAE (Mean Absolute Error) |
Average of absolute errors |
mean_absolute_error() |
|
MSE (Mean Squared Error) |
Average of squared errors |
mean_squared_error() |
|
R² Score |
Accuracy of
regression fit |
r2_score() |
For Classification Problems:
|
Metric |
Description |
Function |
|
Accuracy |
Percentage of correct predictions |
accuracy_score() |
|
Precision |
True positives among predicted positives |
precision_score() |
|
Recall |
True positives among actual positives |
recall_score() |
|
F1-Score |
Harmonic mean of Precision & Recall |
f1_score() |
|
Confusion Matrix |
Table showing TP, FP, TN, FN |
confusion_matrix() |
7. Model
Optimization
- Tune
hyperparameters (e.g., learning rate, max_depth).
- Use Grid
Search or Random Search for tuning.
- Perform
Cross-Validation for better evaluation.
8. Deployment
- Deploy
your trained model using Flask, Streamlit, or FastAPI
for real-world use.
Summary
The Machine Learning workflow is a structured
pipeline involving data collection, preprocessing, train/test splitting, model
training, evaluation, and deployment ensuring reliable and accurate model
performance.

Quite insightful content
ReplyDelete