numpy 常用工具函数 —— np.bincount/np.average

1. np.bincount():统计次数


numpy.bincount(x, weights=None, minlength=None)
尤其适用于计算数据集的标签列(y_train)的分布(distribution),也即获得 class distribution 

>>> np.bincount(y_train.astype(np.int32))
>>> np.bincount(np.array([0, 1, 1, 3, 2, 1, 7])) array([1, 3, 1, 1, 0, 0, 0, 1], dtype=int32) # 分别统计0-7分别出现的次数
If weights is specified the input array is weighted by it, i.e. if a value n is found at position i, out[n] += weight[i] instead of out[n] += 1.

>>> w = np.array([0.3, 0.5, 0.2, 0.7, 1., -0.6]) # weights >>> x = np.array([0, 1, 1, 3, 2, 2]) >>> np.bincount(x, w) array([ 0.3, 0.7, 0.4, 0.7]) # 0: 0.3 # 1:0.5+0.2 # 2: 1+(-0.6) # 3: 0.7
np.bincount() 从零开始计数

>>> np.bincount([3, 4, 4, 3, 3, 5]) array([0, 0, 0, 3, 2, 1], dtype=int32) # 分别表示0出现的次数, # 1出现的次数, # 2出现的次数, # 。。。
2. np.average()

np.average(X, axis=0, weights=w) ==


X = np.array([[.9, .1],
              [.8, .2],
              [.4, .6]])
w = np.array([.2, .2, .6])
print( print(np.average(X, axis=0, weights=w))
P = np.asarray([c.predict_proba(X) for c in clfs])
                            # 此时P是一个三维矩阵
                            # (# of clfs) * (# of samples) * (# of classes) np.average(P, axis=0, weights=w) # 此时的shape为 ((# of samples) * (# of classes)) # 仍然维持行和为1 
也有一些情况下只能使用 np.average 而无法使用dot(矩阵乘法,matrix multiplication)运算:

def predict_proba(self, X):
    probas = np.asarray([clf.predict_proba(X) for clf in self.classifiers_]) # return # 此时self.weights有未赋值的风险 # None类型肯定是不支持dot函数的 return np.average(probas, axis=0, weights=self.weights) # np.average的功能便是,如果weights参数为None # 就执行正常的求平均操作

