版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
德国信用卡数据
1.先下载,通过numpy读取,这种方法只适合numeric数据,不能有类似str型的数据。
import pandas as pd
import numpy as np
german=np.loadtxt(r"D:\data\dataset\german\german.data-numeric" ,delimiter = " ")
data=pd.DataFrame(german)
- 不下载,通过urllib从网页下载
import urllib.request
url = "http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data"
raw_data = urllib.request.urlopen(url)
dataset_raw = np.loadtxt(raw_data, delimiter=" ")
3.下载好的数据,文档里面有str,按行读入,最后需要对各列数据类型做处理。
line_arr=[]
german=np.loadtxt(r"D:\data\dataset\german\german.data" ,delimiter = " ")
with open(r"D:\data\dataset\german\german.data", 'r') as f:
for line in f.readlines():#按行读文件
line_arr.append(line.split())
例如,数据集的前5行如下所示:
A11 6 A34 A43 1169 A65 A75 4 A93 A101 4 A121 67 A143 A152 2 A173 1 A192 A201 1
A12 48 A32 A43 5951 A61 A73 2 A92 A101 2 A121 22 A143 A152 1 A173 1 A191 A201 2
A14 12 A34 A46 2096 A61 A74 2 A93 A101 3 A121 49 A143 A152 1 A172 2 A191 A201 1
A11 42 A32 A42 7882 A61 A74 2 A93 A103 4 A122 45 A143 A153 1 A173 2 A191 A201 1
A11 24 A33 A40 4870 A61 A73 3 A93 A101 4 A124 53 A143 A153 2 A173 2 A191 A201 2
4.对含有str类型的列的数据用pd.read_csv直接读取
colums=['Status of existing checking account','Duration in month','Credit history',\
'Purpose','Credit amount','Savings account/bonds','Present employment since',\
'Installment rate in percentage of disposable income','Personal status and sex',\
'Other debtors / guarantors','Present residence since','Property','Age in years',\
'Other installment plans','Housing','Number of existing credits at this bank',\
'Job','Number of people being liable to provide maintenance for',\
'Telephone','foreign worker']
data = pd.read_csv(r"D:\data\dataset\german\german.data", sep=' ', names=colums)