Python数据分析比较运算范围空值字符匹配逻辑运算随机抽样表格合并字段合并字段匹配

常用条件类型

比较运算

>, <, >=, <=, !=

例如：df[df.comments>1000]

运算范围

between(left,right)

例如：df[df.comments.between(100,300)]

空值匹配

pandas.isnull(column)

例如：df[pandas.isnull(df.title)]

字符匹配

str.contains(patten, na=False)

例如：df[df.title.str.comtains('内容', na=False)]

逻辑运算

&, |, not

例如：

df[(df.comments>=100) & (df.comments<=300)]

等价于df[(df.comments.between(100, 300))]

随机抽样

numpy.random.randint(start, end, num)

numpy.random.randint(范围开始值，结束值，抽样个数)

返回值：行数的索引值序列

例如：

r = numpy.random.randint(0, 10, 3)

df.loc[r, :] #输出r数据框(index,num)

记录合并

多个数据框合并

concat([dataFrame1, dataFrame2, ...])

例如：

df1 = read_csv('路径1', sep='|')

df2 = read_csv('路径2', sep='|')

df3 = read_csv('路径3', sep='|')

df = pandas.concat([df1,df2,df3])

字段合并

x = x1 + x2 + ...

合并后数据列 = 数据列1+数据列2+...

例如：

df = read_csv('路径.csv', sep=' ', names=['band', 'area', 'num'])

df = df.astype(str) #将数字转成字符型，否则会计算求和

tel = df['band']+df['area']+df['num'] #合并成电话号码

字段匹配

将不同结构的数据框，按一定条件合并

merge(x, y, left_on, right_on)

merge(第一个数据框，第二个数据框，第一个数据框用于匹配的列，数据框2匹配的列)

例如：

items = read_csv('路径.csv', sep='|', names=['id', 'comments', 'title'])

prices = read_csv('路径.csv', sep='|', names=['id', 'oldPrice', 'newPrice'])

itemPrice = pandas.merge(items, prices, left_on='id', right_on='id')