sklearn.datasets
三种API接口:
loader
:加载小的标准数据集fetchers
:下载大的真实数据集generate functions
:生成受控的合成数据集MedInc median income in block
HouseAge median house age in block
AveRooms average number of rooms
AveBedrms average number of bedrooms
Population block population
AveOccup average house occupancy
Latitude house block latitude
Longitude house block longitude
函数:make_blobs
函数:make_classification
函数:make_multilabel_classification
函数:make_biclusters
,Generate an array with constant block diagonal structure for biclustering.
函数:make_checkerboard
,Generate an array with block checkerboard structure for biclustering.
函数:make_regression
,产生的回归目标作为一个可选择的稀疏线性组合的具有噪声的随机的特征
函数:make_s_curve
,生成S曲线数据集
函数:make_swiss_roll
,生成swiss roll数据集
sklearn.datasets.fetch_openml
>>> from sklearn.datasets import fetch_openml
>>> mice = fetch_openml(name='miceprotein', version=4)
>>>
# 查看数据集的信息和属性
# DESCR:自由文本描述数据
# details:字典格式的元数据
>>> print(mice.DESCR)
**Author**: Clara Higuera, Katheleen J. Gardiner, Krzysztof J. Cios
**Source**: [UCI](https://archive.ics.uci.edu/ml/datasets/Mice+Protein+Expression) - 2015
**Please cite**: Higuera C, Gardiner KJ, Cios KJ (2015) Self-Organizing
Feature Maps Identify Proteins Critical to Learning in a Mouse Model of Down
Syndrome. PLoS ONE 10(6): e0129126...
>>> mice.details
{'id': '40966', 'name': 'MiceProtein', 'version': '4', 'format': 'ARFF',
'upload_date': '2017-11-08T16:00:15', 'licence': 'Public',
'url': 'https://www.openml.org/data/v1/download/17928620/MiceProtein.arff',
'file_id': '17928620', 'default_target_attribute': 'class',
'row_id_attribute': 'MouseID',
'ignore_attribute': ['Genotype', 'Treatment', 'Behavior'],
'tag': ['OpenML-CC18', 'study_135', 'study_98', 'study_99'],
'visibility': 'public', 'status': 'active',
'md5_checksum': '3c479a6885bfa0438971388283a1ce32'}
pandas.io
,scipy.io
,numpy
skimage.io
,Imagio
,scipy.misc.imread
,scipy.io.wavfile.read