Gabriel Chan

Search

Search IconIcon to open search

Supervised Learning

Last updated Jun 22, 2022 Edit Source

# What is supervised learning?

# Linear Regression

# Cost Function

# Gradient Descent

# Learning rate

# Gradient Descent for linear regression

Calculating derivatives for ww, $$\frac{d}{dw}J(w,b)= \frac{d}{dw}\frac{1}{2m}\sum^{m}{i=1}(\hat{y}^{(i)}-y^{(i)})^2=\frac{d}{dw}\frac{1}{2m}\sum^{m}{i=1}(wx^{(i)}+b-y^{(i)})^2whichisequalto which is equal to \frac{1}{m}\sum^{m}{i=1}(f{w,b}(x^{(i)}-y^{(i)})x^{(i)}$$

and derivative for bb, $$\frac{d}{db}J(w,b)= \frac{d}{dw}\frac{1}{2m}\sum^{m}{i=1}(\hat{y}^{(i)}-y^{(i)})^2 =\frac{d}{db}\frac{1}{2m}\sum^{m}{i=1}(wx^{(i)}+b-y^{(i)})^2whichisequalto which is equal to \frac{1}{m}\sum^{m}{i=1}(f{w,b}(x^{(i)})-y^{(i)})$$

Psuedocode for gradient descent:

1
2
3
while J not converged:
	w = w - a * dJdW
	b = b- a * dJdb

where dJdW = ddwJ(w,b)\frac{d}{dw}J(w,b) and dJdb = ddbJ(w,b)\frac{d}{db}J(w,b)

# Multiple features

what if you have multiple features (variables)?

We can then express the linear regression model as:

fw,b(x)=w1x1+w2x2++wnxn+bf_{w,b}(x)=w_1x_1+w_2x_2+\cdots+w_nx_n+b define w=[w1,,wn]\vec{w} = [w_1,\ldots,w_n] and x=[x1,xn]T\vec{x}=[x_1,\ldots x_n]^T, TT here represents transpose. Then fw,b=wx+bf_{\vec{w},b}=\vec{w}\cdot \vec{x}+bWhere ()(\cdot) represents the dot product.

# Feature Scaling

When the range of values your features can take up differ greatly, i.e.

this may cause gradient descent to run slowly.

Some examples of feature scaling

# max scaling

# mean normalization

# Z-score normalization