下采样方法 - 代码天地

下采样方法

其他 2018-11-30 19:26:18 阅读次数: 0

.loc[],中括号里面是先行后列，以逗号分割，行和列分别是行标签和列标签(label)

.iloc[]与loc一样，中括号里面也是先行后列，行列标签用逗号分割，与loc不同的之处是，.iloc 是根据行数与列数来索引的

.ix上面两种用法都可以

X=data.loc[:,data.columns != 'Class'] #loc 通过行标签索引数据，
y=data.loc[:,data.columns == 'Class'] #取label

#number of data points in the minority class

number_records_fraud=len(data[data.Class==1]) #Class=1的数量
fraud_indices=np.array(data[data.Class==1].index) #取得其索引值

normal_indices=np.array(data[data.Class==0].index) # class为0的数据索引

random_normal_indices=np.random.choice(normal_indices,number_records_fraud,replace=False) # 随机采样，并不对原始dataframe进行替换
random_normal_indices=np.array(random_normal_indices) # 矩阵转换成numpy的array格式

under_sample_indices=np.concatenate([fraud_indices,random_normal_indices]) # 合并class=1和class=0中随机选取的数据

under_sample_data = data.iloc[under_sample_indices,:] #定位到真正数据，iloc通过行号索引行数据

X_undersample=under_sample_data.loc[:,under_sample_data.columns!='Class']
y_undersample=under_sample_data.loc[:,under_sample_data.columns=='Class']
print(X_undersample)
print(y_undersample)

print("Percentage of normal transactions: ", len(under_sample_data[under_sample_data.Class == 0])/len(under_sample_data))
print("Percentage of fraud transactions: ", len(under_sample_data[under_sample_data.Class == 1])/len(under_sample_data))
print("Total number of transactions in resampled data: ", len(under_sample_data))

思路：大样本随机取小样本的数量A--》a

a和B再split成train和test

猜你喜欢

转载自blog.csdn.net/qq_38858247/article/details/83928409

Alias Sampling（别名采样法）【根据概率的一种高效采样方法】【时间复杂度O(1)的离散采样算法】

下采样方法

Inverse transform sampling反变换采样法

概率图模型近似推断—采样法

马尔可夫蒙特卡洛采样法

令人惊艳的算法——蒙特卡洛采样法

图像的放大与缩小(1)——等距采样法

Diffusion Models/Score-based Generative Models背后的深度学习原理(3)：蒙特卡洛采样法和重要采样法

上采样与下采样

上采样，下采样

PCL点云曲面重采样三种方法：上采样，下采样，均匀采样

蛋白质结合自由能计算（伞形采样法为例）

深入理解机器学习——类别不平衡学习（Imbalanced Learning）：样本采样技术-[人工采样技术之SMOTE采样法及Borderline-SMOTE采样法]

下采样

采样方法

采样方法A

重采样：下采样与上采样

深入理解机器学习——类别不平衡学习（Imbalanced Learning）：样本采样技术-[人工采样技术之ADASYN采样法]

图像的上采样、下采样

重采样上采样下采样重采样的原理：

上采样，重采样和下采样，降采样

数字信号处理翻转课堂笔记18——频率采样法设计FIR滤波器及matlab实现

图像的上采样（upsampling）与下采样（downsampled）

图像上采样和图像下采样

图像的下采样Subsampling 与上采样 Upsampling

20180903图像的上采样和下采样

图像的上采样和下采样

时间序列--上采样、下采样

降采样,下采样,池化

图像的上采样（upsampling）与下采样（subsampled）

今日推荐

周排行

Leetcode简单题61~80

解决zookeeper磁盘IO高的问题

多线程相关方法详解

Maven-setting.xml文件详解

Maven 项目的 classpath 理解

渊亭科技大数据笔试题

配置JVM内存分配

计算机网络个人学习笔记（三）网络层：第三部分连载

js中两个等号(==)和三个等号(===)的区别

用C程序自动打开电脑上的程序

每日归档

更多

2024-09-18(0)

2024-09-17(0)

2024-09-16(0)

2024-09-15(0)

2024-09-14(0)

2024-09-13(0)

2024-09-12(0)

2024-09-11(0)

2024-09-10(0)

2024-09-09(0)