Gabriel Chan

Search

Search IconIcon to open search

Supervised Learning

Last updated Jun 22, 2022 Edit Source

# What is supervised learning?

# Linear Regression

# Cost Function

# Gradient Descent

# Learning rate

# Gradient Descent for linear regression

Calculating derivatives for $w$, $$\frac{d}{dw}J(w,b)= \frac{d}{dw}\frac{1}{2m}\sum^{m}{i=1}(\hat{y}^{(i)}-y^{(i)})^2$$$$=\frac{d}{dw}\frac{1}{2m}\sum^{m}{i=1}(wx^{(i)}+b-y^{(i)})^2$$ which is equal to $$\frac{1}{m}\sum^{m}{i=1}(f{w,b}(x^{(i)}-y^{(i)})x^{(i)}$$

and derivative for $b$, $$\frac{d}{db}J(w,b)= \frac{d}{dw}\frac{1}{2m}\sum^{m}{i=1}(\hat{y}^{(i)}-y^{(i)})^2$$$$ =\frac{d}{db}\frac{1}{2m}\sum^{m}{i=1}(wx^{(i)}+b-y^{(i)})^2$$ which is equal to $$\frac{1}{m}\sum^{m}{i=1}(f{w,b}(x^{(i)})-y^{(i)})$$

Psuedocode for gradient descent:

1
2
3
while J not converged:
	w = w - a * dJdW
	b = b- a * dJdb

where dJdW = $\frac{d}{dw}J(w,b)$ and dJdb = $\frac{d}{db}J(w,b)$

# Multiple features

what if you have multiple features (variables)?

We can then express the linear regression model as:

$$f_{w,b}(x)=w_1x_1+w_2x_2+\cdots+w_nx_n+b$$ define $\vec{w} = [w_1,\ldots,w_n]$ and $\vec{x}=[x_1,\ldots x_n]^T$, $T$ here represents transpose. Then $$f_{\vec{w},b}=\vec{w}\cdot \vec{x}+b$$Where $(\cdot)$ represents the dot product.

# Feature Scaling

When the range of values your features can take up differ greatly, i.e.

this may cause gradient descent to run slowly.

Some examples of feature scaling

# max scaling

# mean normalization

# Z-score normalization