Simpson index
The measure equals the probability that two entities taken at random from the dataset (with replacement) represent the same type, where is the total number of types in the dataset.
Gini–Simpson index
The transformation equals the probability that the two entities represent different types.
分布越均衡,该指数越高;分布越集中,该指数越低。
Code
import pandas as pd
def gini_calc(df2):
sum_ = sum_square = 0
sum_ = df2['cnt'].sum()
df2['cnt_prop']=df2['cnt'].apply(lambda x :x/sum_)
for i in df2['cnt_prop']:
sum_square += i**2
return 1-sum_square
################################
df = pd.read_excel('gini.xlsx')
df=df.groupby([df['population'],df['subpopulation'],df['type']],as_index=False).sum()
################################
a=[]
b=[]
c=[]
for name,group in df.groupby([df['population'],df['subpopulation']]):
index = gini_calc(group)
a.append(name[0])
b.append(name[1])
c.append(index)
res={"population":a, "subpopulation":b, "gini_simpson_index":c}
data=pd.DataFrame(res)
result=data.to_csv('gini_result.csv')