Python Machine Learning Basics
Get started with supervised learning using scikit-learn!
1. What is Machine Learning?
Machine Learning (ML) trains models to make predictions or decisions from data, instead of being explicitly programmed.
Types of ML:
- Supervised Learning (labeled data): Regression, Classification.
- Unsupervised Learning (unlabeled data): Clustering, Dimensionality Reduction.
2. scikit-learn Overview
scikit-learn is Python’s go-to library for ML. Key features:
- Simple API for models like regression, SVM, decision trees.
- Tools for data splitting, preprocessing, and evaluation.
Installation:
pip install scikit-learn
3. Simple Linear Regression Example
Goal: Predict a continuous value (e.g., house price) based on a feature (e.g., size).
Step 1: Import Libraries
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
Step 2: Generate Sample Data
# Create synthetic data: y = 2x + 3 + noise
np.random.seed(0)
X = 2 * np.random.rand(100, 1) # Feature (house size)
y = 3 + 2 * X + np.random.randn(100, 1) # Target (price)
Step 3: Split Data into Train/Test Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 4: Train the Model
model = LinearRegression()
model.fit(X_train, y_train) # Train on training data
Step 5: Make Predictions
y_pred = model.predict(X_test)
Step 6: Evaluate the Model
print("Coefficient (slope):", model.coef_[0][0])
print("Intercept (bias):", model.intercept_[0])
print("Mean Squared Error (MSE):", mean_squared_error(y_test, y_pred))
print("R-squared:", r2_score(y_test, y_pred))
Output:
Coefficient (slope): 2.006
Intercept (bias): 2.862
MSE: 0.98
R-squared: 0.78
Step 7: Visualize the Regression Line
plt.scatter(X_test, y_test, color="blue", label="Actual")
plt.plot(X_test, y_pred, color="red", linewidth=2, label="Predicted")
plt.xlabel("House Size")
plt.ylabel("Price")
plt.legend()
plt.show()
4. Key Concepts
How Linear Regression Works
The model learns the equation:
y = β0 + β1x + ϵ
β0
: Intercept (bias).β1
: Coefficient (slope).ϵ
: Error term.
Evaluation Metrics:
- Mean Squared Error (MSE): Average squared difference between actual and predicted values (lower = better).
- R-squared (R2): Proportion of variance explained (0–1, higher = better).
5. Real-World Applications
- Predict Sales: Based on advertising spend.
- Estimate Stock Prices: Using historical trends.
- Forecast Weather: Temperature vs. humidity.
6. Next Steps
- Try Real Datasets: Use Kaggle datasets (e.g., Boston Housing, Diabetes).
- Explore Other Algorithms:
- Classification: Logistic Regression, Decision Trees.
- Clustering: K-Means.
- Preprocess Data: Handle missing values, scale features.
Key Takeaways
- ✅ Linear Regression: Predicts continuous values using a linear relationship.
- ✅ Train/Test Split: Essential to evaluate model performance.
- ✅ scikit-learn: Simplifies implementing ML models in Python.
What’s Next?
Learn classification with the Iris dataset or dive into neural networks with TensorFlow!
Tags:
python