线性组合的感知机可实现非线性边界计算
感知机越多越好?
单层感知机线性组合的局限:
单层扩展到多层,解决异或问题
多层神经网络:
输出层:
中间层:
混淆矩阵(confusion matrix,also called error matrix):用于表征分类模型在一个测试集合上的效果的表格。
比如,对于一个二分类问题,其分类效果可用类似于下表表示,列是预测的类别,行是实际的类别,由此就可知道哪些类别容易预测,哪些容易预测错误(尤其是多分类问题的):
当有一个预测的label
后,实际的label
是已知的,所以可以直接根据这些数值计算并可视化混淆矩阵,sklearn
提供了一个代码,可以参考一下:
计算混淆矩阵:
from sklearn.metrics import confusion_matrix
# 数字类别
y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
[0, 0, 1],
[1, 0, 2]])
# 字符类别
y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"])
array([[2, 0, 0],
[0, 0, 1],
[1, 0, 2]])
可视化,sklearn
也提供了代码,注意的是,这里的classes
如果是数字需要用np.array
转换一下并设置数据类型(classes=np.array([0,1],dtype='<U10')
):
def plot_confusion_matrix(y_true, y_pred, classes, normalize=False, title=None, cmap=plt.cm.Blues):
""" plot_confusion_matrix(list(df_labels['cell_label']),
list(df_prediction_label['label']),
classes=np.array([0,1], dtype='<U10'), # use np to convert and set data type
title='Without normalization')
"""
from sklearn.metrics import confusion_matrix
from sklearn.utils.multiclass import unique_labels
if not title:
if normalize:
title = 'Normalized confusion matrix'
else:
title = 'Confusion matrix, without normalization'
cm = confusion_matrix(y_true, y_pred)
classes = classes[unique_labels(y_true, y_pred)]
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print("Normalized confusion matrix")
else:
print('Confusion matrix, without normalization')
print(cm)
fig, ax = plt.subplots()
im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
ax.figure.colorbar(im, ax=ax)
ax.set(xticks=np.arange(cm.shape[1]),
yticks=np.arange(cm.shape[0]),
xticklabels=classes, yticklabels=classes,
title=title,
ylabel='True label',
xlabel='Predicted label')
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
rotation_mode="anchor")
fmt = '.2f' if normalize else 'd'
thresh = cm.max() / 2.
for i in range(cm.shape[0]):
for j in range(cm.shape[1]):
ax.text(j, i, format(cm[i, j], fmt),
ha="center", va="center",
color="white" if cm[i, j] > thresh else "black")
fig.tight_layout()
return ax
# Plot non-normalized confusion matrix
plot_confusion_matrix(y_test, y_pred, classes=class_names,
title='Confusion matrix, without normalization')
# Plot normalized confusion matrix
plot_confusion_matrix(y_test, y_pred, classes=class_names, normalize=True,
title='Normalized confusion matrix')
sklearn.datasets
三种API接口:
loader
:加载小的标准数据集fetchers
:下载大的真实数据集generate functions
:生成受控的合成数据集MedInc median income in block
HouseAge median house age in block
AveRooms average number of rooms
AveBedrms average number of bedrooms
Population block population
AveOccup average house occupancy
Latitude house block latitude
Longitude house block longitude
函数:make_blobs
函数:make_classification
函数:make_multilabel_classification
函数:make_biclusters
,Generate an array with constant block diagonal structure for biclustering.
函数:make_checkerboard
,Generate an array with block checkerboard structure for biclustering.
函数:make_regression
,产生的回归目标作为一个可选择的稀疏线性组合的具有噪声的随机的特征
函数:make_s_curve
,生成S曲线数据集
函数:make_swiss_roll
,生成swiss roll数据集
sklearn.datasets.fetch_openml
>>> from sklearn.datasets import fetch_openml
>>> mice = fetch_openml(name='miceprotein', version=4)
>>>
# 查看数据集的信息和属性
# DESCR:自由文本描述数据
# details:字典格式的元数据
>>> print(mice.DESCR)
**Author**: Clara Higuera, Katheleen J. Gardiner, Krzysztof J. Cios
**Source**: [UCI](https://archive.ics.uci.edu/ml/datasets/Mice+Protein+Expression) - 2015
**Please cite**: Higuera C, Gardiner KJ, Cios KJ (2015) Self-Organizing
Feature Maps Identify Proteins Critical to Learning in a Mouse Model of Down
Syndrome. PLoS ONE 10(6): e0129126...
>>> mice.details
{'id': '40966', 'name': 'MiceProtein', 'version': '4', 'format': 'ARFF',
'upload_date': '2017-11-08T16:00:15', 'licence': 'Public',
'url': 'https://www.openml.org/data/v1/download/17928620/MiceProtein.arff',
'file_id': '17928620', 'default_target_attribute': 'class',
'row_id_attribute': 'MouseID',
'ignore_attribute': ['Genotype', 'Treatment', 'Behavior'],
'tag': ['OpenML-CC18', 'study_135', 'study_98', 'study_99'],
'visibility': 'public', 'status': 'active',
'md5_checksum': '3c479a6885bfa0438971388283a1ce32'}
pandas.io
,scipy.io
,numpy
skimage.io
,Imagio
,scipy.misc.imread
,scipy.io.wavfile.read
1{.}
叫做指示函数(indicator function),如果括号内容为真,则值为1,为假则值为0.