Stanford ML - Lecture 6 - Advice for applying machine learning

75 阅读 0 评论 50 点赞

我是靠谱客的博主懵懂小熊猫，这篇文章主要介绍Stanford ML - Lecture 6 - Advice for applying machine learning，现在分享给大家，希望可以做个参考。

1. Deciding what to try next

Debugging a learning algorithm
- Suppose you have implemented regularized linear regression to predict housing prices, when you test your hypothesis on a new set of houses, you find that it makes unacceptably large errors in its predictions. What should you try next?
  1. Get more training examples
  2. Try smaller sets of features
  3. Try getting additional features
  4. Try adding polynomial features
  5. Try decreasing $lambda$
  6. Try increasing $lambda$

2. Evaluating a hypothesis

separate data sets into training set (70%) andtest set (30%)
Training/Testing procedure for logistic regression
- learn parameter $theta$ from training data
- compute test set error

$j_{test}(theta) = - frac{1}{m_{test}}sum_{i=1}^{m_{test}} y_{test}^{(i)} log h_{theta} (x_{text}^{(i)}) + (1 - y_{test}^{(i)}) log (1 - h_{theta} (x_{text}^{(i)}))$

- misclassification error (0/1 misclassification error)

3. Model selection and training/validation/test sets

overfitting example
- the training error is likely to be lower than the actual generalization error
model selection
- select the model that has the lowest test error
training set - 60%
cross validation set (cv) - 20%
test set - 20%
training error

$j_{train}(theta) = - frac{1}{2m}sum_{i=1}^{m} (h_{theta} (x^{(i)}) - y^{(i)})^2$

cross validation error

$j_{cv}(theta) = - frac{1}{2m_{cv}}sum_{i=1}^{m_{cv}} (h_{theta} (x_{cv}^{(i)}) - y_{cv}^{(i)})^2$

test error

$j_{test}(theta) = - frac{1}{2m_{test}}sum_{i=1}^{m_{test}} (h_{theta} (x_{test}^{(i)}) - y_{test}^{(i)})^2$

4. Diagnosing bias vs. variance

bias (underfit)
- $j_{train}(theta) textrm{will be high}$
- $j_{cv}(theta) approx j_{train}(theta)$
variance (overfit)
- $j_{train}(theta) textrm{will be low}$
- $j_{cv}(theta) gg j_{train}(theta)$

5. Regularization and bias/variance

choosing the regularization parameter $lambda$
- $textrm{try} lambda = 0rightarrow min_{theta} j(theta) rightarrow theta^{(1)} rightarrow j_{cv} (theta^{(1)})$
- $textrm{try} lambda = 0.01 rightarrow min_{theta} j(theta) rightarrow theta^{(2)} rightarrow j_{cv} (theta^{(2)})$
- $textrm{try} lambda = 0.02 rightarrow min_{theta} j(theta) rightarrow theta^{(3)} rightarrow j_{cv} (theta^{(3)})$
- ........
- $textrm{try} lambda = 10 rightarrow min_{theta} j(theta) rightarrow theta^{(12)} rightarrow j_{cv} (theta^{(12)})$

6. Learning curves

If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much.
If a learning algorithm is suffering from high variance, getting more training data is likely to help.

7. Deciding what to try next (revisited)

"small" neural network (fewer parameters, more prone to underfitting)
- computationally cheapter
"large" neural network (more parameters, more prone to overfitting)
- computationally more expensive
- use regularization to address overfitting

the definition:

Variance: measures the extent to which the solutions for individual data sets vary around their average, hence this measures the extent to which the function f(x) is sensitive to theparticular choice of data set.

Bias: represents the extent to which the average prediction over all data sets differs from the desired regression function.

variance：估计本身的方差。

bias：估计的期望和样本数据样本希望得到的回归函数之间的差别。

From : http://blog.csdn.net/abcjennifer/article/details/7797502