读取数据 python

本文链接： https://blog.csdn.net/qq_33598125/article/details/102648896

德国信用卡数据
1.先下载，通过numpy读取，这种方法只适合numeric数据，不能有类似str型的数据。

import pandas as pd
import numpy as np
german=np.loadtxt(r"D:\data\dataset\german\german.data-numeric" ,delimiter = " ")
data=pd.DataFrame(german)

不下载，通过urllib从网页下载

import urllib.request
url = "http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data"
raw_data = urllib.request.urlopen(url)
dataset_raw = np.loadtxt(raw_data, delimiter=" ")

3.下载好的数据，文档里面有str，按行读入，最后需要对各列数据类型做处理。

line_arr=[]
german=np.loadtxt(r"D:\data\dataset\german\german.data" ,delimiter = " ")
with open(r"D:\data\dataset\german\german.data", 'r') as f:
    for line in f.readlines():#按行读文件
        line_arr.append(line.split())

例如，数据集的前5行如下所示：

A11 6 A34 A43 1169 A65 A75 4 A93 A101 4 A121 67 A143 A152 2 A173 1 A192 A201 1
A12 48 A32 A43 5951 A61 A73 2 A92 A101 2 A121 22 A143 A152 1 A173 1 A191 A201 2
A14 12 A34 A46 2096 A61 A74 2 A93 A101 3 A121 49 A143 A152 1 A172 2 A191 A201 1
A11 42 A32 A42 7882 A61 A74 2 A93 A103 4 A122 45 A143 A153 1 A173 2 A191 A201 1
A11 24 A33 A40 4870 A61 A73 3 A93 A101 4 A124 53 A143 A153 2 A173 2 A191 A201 2

4.对含有str类型的列的数据用pd.read_csv直接读取

colums=['Status of existing checking account','Duration in month','Credit history',\
        'Purpose','Credit amount','Savings account/bonds','Present employment since',\
        'Installment rate in percentage of disposable income','Personal status and sex',\
        'Other debtors / guarantors','Present residence since','Property','Age in years',\
        'Other installment plans','Housing','Number of existing credits at this bank',\
        'Job','Number of people being liable to provide maintenance for',\
        'Telephone','foreign worker']
data = pd.read_csv(r"D:\data\dataset\german\german.data", sep=' ', names=colums)

猜你喜欢