在pandas里面常用用value_counts确认数据出现的频率。
- Series 情况下
-
import numpy as np
-
import pandas as pd
-
from pandas import DataFrame
-
from pandas import Series
-
ss = Series([ 'Tokyo', 'Nagoya', 'Nagoya', 'Osaka', 'Tokyo', 'Tokyo'])
-
ss.value_counts() #value_counts 直接用来计算series里面相同数据出现的频率
-
Tokyo 3
-
Nagoya 2
-
Osaka 1
-
dtype: int64
- DataFrame 情况下
-
import numpy as np
-
import pandas as pd
-
from pandas import DataFrame
-
from pandas import Series
-
df=DataFrame({'a':['Tokyo','Osaka','Nagoya','Osaka','Tokyo','Tokyo'],'b':['Osaka','Osaka','Osaka','Tokyo','Tokyo','Tokyo']}) #DataFrame用来输入两列数据,同时value_counts将每列中相同的数据频率计算出来
-
print(df)
-
a b
-
0 Tokyo Osaka
-
1 Osaka Osaka
-
2 Nagoya Osaka
-
3 Osaka Tokyo
-
4 Tokyo Tokyo
-
5 Tokyo Tokyo
-
df.apply(pd.value_counts)
-
a b
-
Nagoya 1 NaN #在b列中meiynagoya,因此是用NaN 表示。
-
Osaka 2 3.0
-
Tokyo 3 3.0
http://ailaby.com/dataframe_value_counts/