Introduction to Machine Learning scikit-learn, Simple Regression Example

Python Machine Learning Basics

Get started with supervised learning using scikit-learn!

1. What is Machine Learning?

Machine Learning (ML) trains models to make predictions or decisions from data, instead of being explicitly programmed.

Types of ML:

Supervised Learning (labeled data): Regression, Classification.
Unsupervised Learning (unlabeled data): Clustering, Dimensionality Reduction.

2. scikit-learn Overview

scikit-learn is Python’s go-to library for ML. Key features:

Simple API for models like regression, SVM, decision trees.
Tools for data splitting, preprocessing, and evaluation.

Installation:


pip install scikit-learn

3. Simple Linear Regression Example

Goal: Predict a continuous value (e.g., house price) based on a feature (e.g., size).

Step 1: Import Libraries


import numpy as np  
from sklearn.model_selection import train_test_split  
from sklearn.linear_model import LinearRegression  
from sklearn.metrics import mean_squared_error, r2_score  
import matplotlib.pyplot as plt

Step 2: Generate Sample Data


# Create synthetic data: y = 2x + 3 + noise  
np.random.seed(0)  
X = 2 * np.random.rand(100, 1)  # Feature (house size)  
y = 3 + 2 * X + np.random.randn(100, 1)  # Target (price)

Step 3: Split Data into Train/Test Sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 4: Train the Model


model = LinearRegression()  
model.fit(X_train, y_train)  # Train on training data

Step 5: Make Predictions


y_pred = model.predict(X_test)

Step 6: Evaluate the Model


print("Coefficient (slope):", model.coef_[0][0])  
print("Intercept (bias):", model.intercept_[0])  
print("Mean Squared Error (MSE):", mean_squared_error(y_test, y_pred))  
print("R-squared:", r2_score(y_test, y_pred))

Output:


Coefficient (slope): 2.006  
Intercept (bias): 2.862  
MSE: 0.98  
R-squared: 0.78

Step 7: Visualize the Regression Line


plt.scatter(X_test, y_test, color="blue", label="Actual")  
plt.plot(X_test, y_pred, color="red", linewidth=2, label="Predicted")  
plt.xlabel("House Size")  
plt.ylabel("Price")  
plt.legend()  
plt.show()

4. Key Concepts

How Linear Regression Works

The model learns the equation:


y = β0 + β1x + ϵ

β0: Intercept (bias).
β1: Coefficient (slope).
ϵ: Error term.

Evaluation Metrics:

Mean Squared Error (MSE): Average squared difference between actual and predicted values (lower = better).
R-squared (R2): Proportion of variance explained (0–1, higher = better).

5. Real-World Applications

Predict Sales: Based on advertising spend.
Estimate Stock Prices: Using historical trends.
Forecast Weather: Temperature vs. humidity.

6. Next Steps

Try Real Datasets: Use Kaggle datasets (e.g., Boston Housing, Diabetes).
Explore Other Algorithms:
- Classification: Logistic Regression, Decision Trees.
- Clustering: K-Means.
Preprocess Data: Handle missing values, scale features.

Key Takeaways

✅ Linear Regression: Predicts continuous values using a linear relationship.
✅ Train/Test Split: Essential to evaluate model performance.
✅ scikit-learn: Simplifies implementing ML models in Python.

What’s Next?

Learn classification with the Iris dataset or dive into neural networks with TensorFlow!

Previous Next

Ethical circuits - AI News and Tips