【python3】批量删除voc数据集xml文件里的某些节点,得到单独某一类别的数据集(代码清晰,易操作!)

举例说明,比如说是VOC 2007 train+val,只留下人和车类,其他类别去除掉:
  1. 下面代码去除掉xml文件中,不需要的 类别的 节点:
    注意:使用代码时路径均使用绝对路径;
import xml.etree.cElementTree as ET
import os

# VOC 2007 train+val
path_root = "/*/VOCdevkit/VOC2007/Annotations/"
 
CLASSES = ["person","car"]
xml_list = os.listdir(path_root)
count = 0

for axml in xml_list:
    path_xml = os.path.join(path_root, axml)
    tree = ET.parse(path_xml)
    root = tree.getroot()
 
    for child in root.findall('object'):
        name = child.find('name').text
        if not name in CLASSES:
            root.remove(child)
 
    tree.write(os.path.join("/*/VOCdevkit/VOC2007/Annotations1/", axml))
    print(axml)
    count = count + 1
    
print(count)
  1. 下面代码用于删掉刚才去除其他节点后不包含人和车类的xml文件
  • 如果使用的是tensorflow object detection API来进行voc数据集到tfrecord的转换需要记得更新/*/trainval/VOCdevkit/VOC2007/ImageSets/Main/aeroplane_trainval.txt文件为新的xml文件对应的图片情况;
# 1.delete the xml that no person or car object 
# 2.update /*/trainval/VOCdevkit/VOC2007/ImageSets/Main/aeroplane_trainval.txt
# for transer to tfrecord
path_root = "/*/VOCdevkit/VOC2007/Annotations1/"
 

xml_list = os.listdir(path_root)
count = 0

for axml in xml_list:
    path_xml = os.path.join(path_root, axml)
    tree = ET.parse(path_xml)
    root = tree.getroot()
    size = len(root.findall('object'))
    #print(size)
    if size == 0:
        print(axml + " " +'\\')
    else:
        count = count + 1
        print(axml[0:6])
    
print(count)

猜你喜欢

转载自blog.csdn.net/qq_43348528/article/details/107336676