划分训练、测试数据集，从Dataframe中选取固定id的行数据并存入txt文件 - 代码天地

划分训练、测试数据集，从Dataframe中选取固定id的行数据并存入txt文件

其他 2020-01-18 13:16:42 阅读次数: 0

import os, random, shutil

# 创建目录
test_path = './test_path/'
if not os.path.exists(test_path):
    os.makedirs(test_path)
    print('test_path is Ok')
else:
    print('test_path is exit')


new_train_list_path = './test_path/new_train_list.txt'
old_train_list_path = './train_list.txt'

# 去掉train/, 存入new_train_list.txt

f_new_train = open(new_train_list_path, 'w')

for line in open(old_train_list_path, 'r'):
#     print(line)
    img = line.split(' ')[0].split('/')[1]
    pid = line.split(' ')[1].rstrip()
    f_new_train.write(img + ' ' + pid + '\n')
f_new_train.close()



## 划分test_split.txt

import pandas as pd
import numpy as np

# 读取new_train_list.txt

new_train_df = pd.read_table(new_train_list_path, sep=' ', header=None)
new_train_df.columns = ['img', 'pid']
# new_train_df.shape
if os.path.exists(finally_train_list):
    os.remove(finally_train_list)
    
if os.path.exists(test_split_100):
    os.remove(test_split_100)    
finally_train_list = './test_path/finally_train_list.txt'
test_split_100 = './test_path/test_split_1200.txt'

import random
# 新建一个空的dataframe
test_split_100_df = pd.DataFrame(columns=new_train_df.columns)

# 产生100个不重复的随机数
all_random_pid = random.sample(list(new_train_df['pid'].unique()), 1200)

# 选取100个随机数对应的行
for random_pid in all_random_pid:
#     print(random_pid)
    random_pid_line = new_train_df[new_train_df['pid'] == random_pid]
    test_split_100_df = test_split_100_df.append(random_pid_line)
    new_train_df = new_train_df[~(new_train_df['pid'] == random_pid)]

test_split_100_df.to_csv(test_split_100)
new_train_df.to_csv(finally_train_list)
test_split_100_df.head(10)

发布了29 篇原创文章 · 获赞 12 · 访问量 1万+

私信关注

猜你喜欢

转载自blog.csdn.net/c2250645962/article/details/103253996

划分训练、测试数据集，从Dataframe中选取固定id的行数据并存入txt文件

Python机器学习数据预处理：读取txt数据文件并切分为训练和测试数据集

PASCAL VOC数据集训练集、验证集、测试集的划分和提取，得到test.txt、train.txt、trainval.txt、val.txt文件代码

java读取.xlsx数据并存入.txt文件中

pandas.DataFrame.sample函数抽样划分Pascal voc数据训练集验证集测试集

机器学习：训练数据集、测试数据集

paddleseg数据集自定义比例划分为测试集test.txt，训练集train.txt，验证集val.txt

python3 读取txt、csv、mat文件数据并存入array具体实现

数据集的训练集和测试集划分

如何把数据集划分成训练集和测试集

机器学习数据集（训练集、测试集）划分方法

将数据集划分为训练集和测试集

python读取数据集文件下所有文件并打乱划分生成训练测试txt文件（生成train.txt、test.txt，顺序随机，默认比例8:2）

python使用pandas读取xlsx数据并存入txt

神经网络中训练数据集、验证数据集和测试数据集的区别

机器学习之数据集划分——训练集测试集划分，划分函数，估计器的使用

数据划分测试、验证集，文件转移

DataFrame使用pd.sample()随机选取N行数据

tensorflow学习笔记——获取训练数据集和测试数据集

pandas DataFrame 选取数据

【脚本】生成已划分好训练集、验证集、测试集的数据集对应的train.txt、val.txt、test.txt【包含图像的绝对地址】

【深度学习数据集的自动切分，分别生成训练，验证，测试的txt文件】

机器学习数据挖掘数据集划分训练集验证集测试集

深度学习训练使用Python快速批量将数据集中选取的指定图片移动到新文件中

R语言使用caret包的predict函数对模型在测试集上的表现进行推理和预测、predict函数对测试数据集进行数据预处理（和训练集的初始方式保持一致）

pandas.DataFrame中选取、修改数据.loc，.iloc，.ix

如何从dataFrame中选取指定行，就像数据库select一样

2019腾讯广告算法大赛之整理测试数据集以及构造训练集

Python读取csv数据文件，并按照指定比例划分为机器学习使用的训练集和测试集

机器学习数据集划分训练集验证集测试集

今日推荐

周排行

Leetcode简单题61~80

解决zookeeper磁盘IO高的问题

多线程相关方法详解

Maven-setting.xml文件详解

Maven 项目的 classpath 理解

渊亭科技大数据笔试题

配置JVM内存分配

计算机网络个人学习笔记（三）网络层：第三部分连载

js中两个等号(==)和三个等号(===)的区别

用C程序自动打开电脑上的程序

每日归档

更多

2024-09-18(0)

2024-09-17(0)

2024-09-16(0)

2024-09-15(0)

2024-09-14(0)

2024-09-13(0)

2024-09-12(0)

2024-09-11(0)

2024-09-10(0)

2024-09-09(0)