第6章 机器学习系统的设计
1 Recommend approach
- Start with a simple algorithm that you can implement quickly. Implemnt it and test it on your cross-validation data.
- Plot learning curves to decide if more data, more features, etc. are likely to help.
- Error analysis: Manually(人工) examine the examples ( in cross-validation set ) that your algorithm made errors on. See if you spot any systematic trend(系统化的趋势) in what type of examples it is making errors on
2 Error metrics for skewed classes(偏斜类的误差度量)
情况 | 预测Predict | 实际Actual |
---|---|---|
正确肯定 True Positive, TP | true | true |
正确否定 True Negative, TN | false | false |
错误肯定 False Positive, FP | true | false |
错误否定 False Negative, FN | false | true |
2.1 Precision ( 查准率 )
- P r e c i s i o n = T P T P + F P Precision=\frac{TP}{TP+FP} Precision=TP+FPTP
2.2 Recall ( 查全率 )
- R e c a l l = T P T P + F N Recall=\frac{TP}{TP+FN} Recall=TP+FNTP
2.3 Trading Off Precision and Recall
- F1 Score: 2 P R P + R 2\frac{PR}{P+R} 2P+RPR
3 Data for Machine Learning
- Algorithms:
(1) Perception ( Logistic regression )
(2) Winnow
(3) Memory-based
(4) Naive Bayes - It’s not who has the best algorithm that wins. It’s who has the most data.
- Large data rationable:
(1) Use a learning algorithm with many parameters → J t r a i n ( θ ) J_{train}(\theta) Jtrain(θ) will be small
(2) Use a very large training set → J t r a i n ( θ ) J t e s t ( θ ) J_{train}(\theta)J_{test}(\theta) Jtrain(θ)Jtest(θ)
(3) from (1) + (2) → J t e s t ( θ ) J_{test}(\theta) Jtest(θ) wiil be small
4 Designing a high accuracy learning system
- 是否可以通过特征值预测信息
- 大量数据 + 多参数算法
5 Reference
吴恩达 机器学习 coursera machine learning
黄海广 机器学习笔记