文章目录
【1】配合numpy生成简单的表
1.体验什么是表
index是行标签,columns是列标签
import numpy as np
import pandas as pd
s = pd.Series([1,2,3,'lyh',np.nan])
print(s)
'''
0 1
1 2
2 3
3 lyh
4 NaN
dtype: object
'''
2.流程
import numpy as np
import pandas as pd
datas = pd.date_range('20200520',periods=6)
print(datas) #产生行标签
'''
DatetimeIndex(['2020-05-20', '2020-05-21', '2020-05-22', '2020-05-23',
'2020-05-24', '2020-05-25'],
dtype='datetime64[ns]', freq='D')
'''
df = pd.DataFrame(np.random.randn(6,4),index=datas,columns=['a','b','c','d'])
print(df)
'''
a b c d
2020-05-20 0.761828 0.955788 0.803339 0.174795
2020-05-21 -0.759550 0.863438 -0.416918 0.174165
2020-05-22 1.316918 -1.477811 -0.729416 0.338009
2020-05-23 -0.671851 -0.747983 -0.894371 -0.544008
2020-05-24 0.512189 0.223148 -0.707499 0.801942
2020-05-25 1.124180 -1.118842 1.185066 -0.539323
'''
【2】类似字典法指定生成
对行轴、列轴、值的索引、数字的统计、转置
import numpy as np
import pandas as pd
df2 = pd.DataFrame({
'A':1, #行数不够自动按照1补齐
'B':['a','b','c'],
'C':pd.Series(list(np.arange(3)))
})
print(df2)
'''
A B C
0 1 a 0
1 1 b 1
2 1 c 2
'''
print(df2.index)
# RangeIndex(start=0, stop=3, step=1)
#描述左边行的索引,从0开始,3结束
print(df2.columns)
#Index(['A', 'B', 'C'], dtype='object')
#描述列栏的索引
print(df2.values)
'''
[[1 'a' 0]
[1 'b' 1]
[1 'c' 2]]
'''
print(df2.describe()) #对数字部分进行统计
'''
A C
count 3.0 3.0
mean 1.0 1.0
std 0.0 1.0
min 1.0 0.0
25% 1.0 0.5
50% 1.0 1.0
75% 1.0 1.5
max 1.0 2.0
'''
print(df2.T) #转置
'''
0 1 2
A 1 1 1
B a b c
C 0 1 2
'''
排序
按照行轴升序,降序
按照列轴升序,降序
按照某一列的值升序,降序
import numpy as np
import pandas as pd
df2 = pd.DataFrame({
'A':1,
'B':['a','b','c'],
'C':pd.Series(list(np.arange(3)))
})
print(df2)
df2 = df2.sort_index(axis=1,ascending=False)
#axis=1对列坐标排序,False表示倒序
print(df2)
'''
C B A
0 0 a 1
1 1 b 1
2 2 c 1
'''
df3 = df2.sort_index()
#默认axis=0,ascending=True 从小到大排
print(df3)
'''
C B A
0 0 a 1
1 1 b 1
2 2 c 1
'''
df4 = df2.sort_values(by='B',ascending=False)
#按照B列的倒序排列,行坐标的索引跟着改变
print(df4)
'''
C B A
2 2 c 1
1 1 b 1
0 0 a 1
'''
【3】选择数据
1.原始数据
import numpy as np
import pandas as pd
row = pd.date_range('20200726',periods=6)
dates = np.arange(24).reshape(6,4)
column = ['A','B','C','D']
df = pd.DataFrame(dates,index=row,columns=column)
print(df)
'''
A B C D
2020-07-26 0 1 2 3
2020-07-27 4 5 6 7
2020-07-28 8 9 10 11
2020-07-29 12 13 14 15
2020-07-30 16 17 18 19
2020-07-31 20 21 22 23
'''
2.通过列标签、行标签、切片查找
列标签的值:df[‘A’]
print(df['A'])
# 或 print(df.A)
'''
2020-07-26 0
2020-07-27 4
2020-07-28 8
2020-07-29 12
2020-07-30 16
2020-07-31 20
Freq: D, Name: A, dtype: int32
'''
行标签的值:df[‘20200727’]
print(df['20200727':'20200729'])
'''
A B C D
2020-07-27 4 5 6 7
2020-07-28 8 9 10 11
2020-07-29 12 13 14 15
'''
根据行列的下标df[0:3]切片
print(df[0:3])
'''
A B C D
2020-07-26 0 1 2 3
2020-07-27 4 5 6 7
2020-07-28 8 9 10 11
'''
3.通过df.loc指定行、列标签切片查找
指定特定的行,特定的列
print(df)
'''
A B C D
2020-07-26 0 1 2 3
2020-07-27 4 5 6 7
2020-07-28 8 9 10 11
2020-07-29 12 13 14 15
2020-07-30 16 17 18 19
2020-07-31 20 21 22 23
'''
print(df.loc['20200726',['A','B']])
'''
A 0
B 1
'''
print(df.loc[['20200726','20200728'],['A','B']])
'''
A B
2020-07-26 0 1
2020-07-28 8 9
'''
:表示对所有的行进行指定
print(df.loc[:,['B','A']])
'''
B A
2020-07-26 1 0
2020-07-27 5 4
2020-07-28 9 8
2020-07-29 13 12
2020-07-30 17 16
2020-07-31 21 20
'''
4.通过df.iloc指定下标查找某值、切片
指定行标签的下标查找某行
print(df)
'''
A B C D
2020-07-26 0 1 2 3
2020-07-27 4 5 6 7
2020-07-28 8 9 10 11
2020-07-29 12 13 14 15
2020-07-30 16 17 18 19
2020-07-31 20 21 22 23
'''
print(df.iloc[3])
'''
A 12
B 13
C 14
D 15
Name: 2020-07-29 00:00:00, dtype: int32
'''
指定行、列标签的下标查找某值
print(df.iloc[1,3])
# 7 即20200727行的D列
指定行、列标签的下标连续切片
print(df.iloc[0:3,1:3])
'''
B C
2020-07-26 1 2
2020-07-27 5 6
2020-07-28 9 10
'''
指定行、列标签的下标不连续切片
print(df.iloc[[1,3,5],1:3])
'''
B C
2020-07-27 5 6
2020-07-29 13 14
2020-07-31 21 22
'''
根据A列大于某值,输出
注意两者的区别:方法1会将其他列同时输出,方法2只输出 . 后指定的列
print(df[df.A >8])
'''
A B C D
2020-07-29 12 13 14 15
2020-07-30 16 17 18 19
2020-07-31 20 21 22 23
'''
print(df.A[df.A >8])
'''
2020-07-28 222
2020-07-29 12
2020-07-30 16
2020-07-31 20
'''
【4】设置值
1.通过iloc指定下标,loc指定标签修改某坐标的值
df.iloc[1,2]=111
df.loc['20200728','A']=222
print(df)
'''
A B C D
2020-07-26 0 1 2 3
2020-07-27 4 5 111 7
2020-07-28 222 9 10 11
2020-07-29 12 13 14 15
2020-07-30 16 17 18 19
2020-07-31 20 21 22 23
'''
2.通过设置满足的条件
用法和查询时一样,可以指定修改某列,或不指定则全部修改
df.A[df.B > 3] = 0
print(df)
'''
2020-07-31 20 21 22 23
A B C D
2020-07-26 0 1 2 3
2020-07-27 0 5 111 7
2020-07-28 0 9 10 11
2020-07-29 0 13 14 15
2020-07-30 0 17 18 19
2020-07-31 0 21 22 23
'''
3.增加某列
df['E'] = np.nan #赋nan值
df['F'] = pd.Series(np.arange(1,7),index=df.index)
# 增加一个序列
# df['F'] = pd.Series(np.arange(1,7),index=row)
df['G'] = 1 #赋相同的值
print(df)
'''
A B C D E F G
2020-07-26 0 1 2 3 NaN 1 1
2020-07-27 4 5 6 7 NaN 2 1
2020-07-28 8 9 10 11 NaN 3 1
2020-07-29 12 13 14 15 NaN 4 1
2020-07-30 16 17 18 19 NaN 5 1
2020-07-31 20 21 22 23 NaN 6 1
'''