第一课 数据分析工具Pandas高阶
第一节 函数索引
import pandas as pd
import numpy as np
# 确认Pandas的版本>=0.18.1
pd.__version__
'0.25.1'
读取文件
raw_df = pd.read_csv('./datasets/2016_happiness.csv')
raw_df.head()
Country Region Happiness Rank Happiness Score Lower Confidence Interval Upper Confidence Interval Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
0 Denmark Western Europe 17.5267.4607.5921.441781.163740.795040.579410.444530.361712.739391 Switzerland Western Europe 27.5097.4287.5901.527331.145240.863030.585570.412030.280832.694632 Iceland Western Europe 37.5017.3337.6691.426661.183260.867330.566240.149750.476782.831373 Norway Western Europe 47.4987.4217.5751.577441.126900.795790.596090.357760.378952.664654 Finland Western Europe 57.4137.3517.4751.405981.134640.810910.571040.410040.254922.82596
过滤排名在10-20的国家和排名
raw_df.loc[lambda df:(df['Happiness Rank']>=10)&(df['Happiness Rank']<=20),['Country','Happiness Rank']]
Country Happiness Rank
9 Sweden 1010 Israel 1111 Austria 1212 United States 1313 Costa Rica 1414 Puerto Rico 1515 Germany 1616 Brazil 1717 Belgium 1818 Ireland 1919 Luxembourg 20
过滤排名前5行数据的国家和排名
raw_df.iloc[:5,lambda df:[0,2]]
Country Happiness Rank
0 Denmark 11 Switzerland 22 Iceland 33 Norway 44 Finland 5
获取幸福指数平均得分大于7的地区
raw_df.groupby('Region')['Happiness Score'].mean()
Region
Australia and New Zealand 7.323500
Central and Eastern Europe 5.370690
Eastern Asia 5.624167
Latin America and Caribbean 6.101750
Middle East and Northern Africa 5.386053
North America 7.254000
Southeastern Asia 5.338889
Southern Asia 4.563286
Sub-Saharan Africa 4.136421
Western Europe 6.685667
Name: Happiness Score, dtype: float64
raw_df.groupby('Region')['Happiness Score'].mean().loc[lambda s: s>7]
Region
Australia and New Zealand 7.3235
North America 7.2540
Name: Happiness Score, dtype: float64