Introduction to Machine Learning scikit-learn, Simple Regression Example

Python Machine Learning Basics

Python Machine Learning Basics

Get started with supervised learning using scikit-learn!

1. What is Machine Learning?

Machine Learning (ML) trains models to make predictions or decisions from data, instead of being explicitly programmed.

Types of ML:

  • Supervised Learning (labeled data): Regression, Classification.
  • Unsupervised Learning (unlabeled data): Clustering, Dimensionality Reduction.

2. scikit-learn Overview

scikit-learn is Python’s go-to library for ML. Key features:

  • Simple API for models like regression, SVM, decision trees.
  • Tools for data splitting, preprocessing, and evaluation.

Installation:


pip install scikit-learn  

            

3. Simple Linear Regression Example

Goal: Predict a continuous value (e.g., house price) based on a feature (e.g., size).

Step 1: Import Libraries


import numpy as np  
from sklearn.model_selection import train_test_split  
from sklearn.linear_model import LinearRegression  
from sklearn.metrics import mean_squared_error, r2_score  
import matplotlib.pyplot as plt  

            

Step 2: Generate Sample Data


# Create synthetic data: y = 2x + 3 + noise  
np.random.seed(0)  
X = 2 * np.random.rand(100, 1)  # Feature (house size)  
y = 3 + 2 * X + np.random.randn(100, 1)  # Target (price)  

            

Step 3: Split Data into Train/Test Sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  

            

Step 4: Train the Model


model = LinearRegression()  
model.fit(X_train, y_train)  # Train on training data  

            

Step 5: Make Predictions


y_pred = model.predict(X_test)  

            

Step 6: Evaluate the Model


print("Coefficient (slope):", model.coef_[0][0])  
print("Intercept (bias):", model.intercept_[0])  
print("Mean Squared Error (MSE):", mean_squared_error(y_test, y_pred))  
print("R-squared:", r2_score(y_test, y_pred))  

            

Output:


Coefficient (slope): 2.006  
Intercept (bias): 2.862  
MSE: 0.98  
R-squared: 0.78  

            

Step 7: Visualize the Regression Line


plt.scatter(X_test, y_test, color="blue", label="Actual")  
plt.plot(X_test, y_pred, color="red", linewidth=2, label="Predicted")  
plt.xlabel("House Size")  
plt.ylabel("Price")  
plt.legend()  
plt.show()  

            

4. Key Concepts

How Linear Regression Works

The model learns the equation:


y = β0 + β1x + ϵ

            
  • β0: Intercept (bias).
  • β1: Coefficient (slope).
  • ϵ: Error term.

Evaluation Metrics:

  • Mean Squared Error (MSE): Average squared difference between actual and predicted values (lower = better).
  • R-squared (R2): Proportion of variance explained (0–1, higher = better).

5. Real-World Applications

  • Predict Sales: Based on advertising spend.
  • Estimate Stock Prices: Using historical trends.
  • Forecast Weather: Temperature vs. humidity.

6. Next Steps

  • Try Real Datasets: Use Kaggle datasets (e.g., Boston Housing, Diabetes).
  • Explore Other Algorithms:
    • Classification: Logistic Regression, Decision Trees.
    • Clustering: K-Means.
  • Preprocess Data: Handle missing values, scale features.

Key Takeaways

  • ✅ Linear Regression: Predicts continuous values using a linear relationship.
  • ✅ Train/Test Split: Essential to evaluate model performance.
  • ✅ scikit-learn: Simplifies implementing ML models in Python.

What’s Next?

Learn classification with the Iris dataset or dive into neural networks with TensorFlow!

Post a Comment

Previous Post Next Post