Toggle navigation
Home
Genomics
Python
Linux
Visualization
Statistics
MachineLearning
Reading
Other
About
Home
>
MachineLearning
> Main text
[CS229] 11: Machine Learning System Design
Tag:
python
,
machine learning
2018-11-20
11: Machine Learning System Design
垃圾邮件检测:
监督学习:单词作为特征
收集数据,email头信息提取特征,正文信息提取特征,错误拼写检测
误差分析:
实现简单模型,测试在验证数据集上的效果
画学习曲线,看数据量、增添特征能否提升模型性能
误差分析:focus那些预测错误的样本,看是否有什么明显的趋势或者共同特征?
分析需要在验证数据集上,不是测试集上
skewd class的误差分析:
precision: # true positive / # predicted positive = # true positive / (# true positive + # false positive)
recall: # true positive / # actual positive = # true positive / (# true positive + # false negative)
F1 score = 2 * (Precision * Recall) / (Precision + Recall)
在验证数据集上,计算F1 score,并使其最大化,对应于模型效果最佳
large data rationale: 可以构建有更多参数的模型
If you link this blog, please refer to this page, thanks!
Post link:
https://tsinghua-gongjing.github.io/posts/CS229-11-ML-system.html
Previous:
[CS229] 10: Advice for applying machine learning techniques
Next:
半监督学习
Please enable JavaScript to view the
comments powered by Disqus.
Category
Genomics
Python
Linux
Visualization
Statistics
MachineLearning
Reading
Other
Tags
Latest articles
AI在乳腺癌检测中的应用
基于三代测序数据预测m6A修饰位点
使用迁移学习对scRNA数据降噪
深度学习助力RNA可变剪切的预测
Excel常见用法
Word常见用法
Basic operations on matrix
LSTM
Run jobs on GPU server
稀疏编码
Links
ZhangLab
,
RISE database
,
THU life
,
THU info
Data analysis:
pandas
,
numpy
,
scipy
ML/DL:
sklearn
,
sklearn(中文)
,
pytorch
Visualization:
seaborn
,
matplotlib
,
gallery
Github:
me