什么是xml
XML 指可扩展标记语言(eXtensible Markup Language)。
XML 被设计用来传输和存储数据。
XML是一套定义语义标记的规则,这些标记将文档分成许多部件并对这些部件加以标识。
它也是元标记语言,即定义了用于定义其他与特定领域有关的、语义的、结构化的标记语言的句法语言。
要注意xml语法规范:
1.标签成对出现
2.区分大小写
3.标签要正确嵌套
4.开始部分必须是
<?xml version="1.0" encoding="utf-8"?>
#xml用什么写都行 要是出现乱码 把utf-8换成gbk
5.只能有一个根节点
6.节点可以有属性
DTD(Document Type Definition):约束XML文件的节点 通俗的理解是规定我们xml文件该怎么样去写
<!DOCTYPE students [ # students是根节点
<!ELEMENT students (student+)> # +代表可以一个或者多个
<!ELEMENT student (name+,age,sex)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT age (#PCDATA)>
<!ELEMENT sex (#PCDATA)>
<!ATTLIST student id CDATA #REQUIRED>
# ATTLIST属性的意思 CDATA是纯文本的意思
# student节点有属性属性名字叫id,属性的值是纯文本
]>
#REQUIRED(必需的),#IMPLIED(不是必需的),#FIXED(属性值是固定的)
例:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE students [
<!ELEMENT students (student+)>
<!ELEMENT student (name+,age,sex)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT age (#PCDATA)>
<!ELEMENT sex (#PCDATA)>
<!ATTLIST student id CDATA #REQUIRED>
]>
<students>
<student id="1">
<name>小明</name>
<name>小李</name>
<age>18</age>
<sex>男</sex>
</student>
<student id="2">
<name>小红</name>
<name>小每</name>
<age>16</age>
<sex>女</sex>
</student>
</students>
xml解析
1.dom(文档对象模型) :把解析的xml整个加载到内存,组织成object树。常用的解析方式
2.sax:事件驱动式解析,不会在内存中加载整个文档,只会根据自己编写的事件保存数据。
3.ElementTree解析:ElementTree就像一个轻量级的DOM,具有方便友好的API。代码可用性好,速度快,消耗内存少。
例
通过dom进行解析
from xml.dom.minidom import parse
class Student():
def __init__(self,name=None,age=None,sex=None,id=None):
self.name=name
self.age=age
self.sex=sex
self.id=id
def __repr__(self):
if len(self.name.encode("utf-8"))<=8:
return self.id+"\t"+self.name+'\t'+self.age+'\t'+self.sex
else:
return self.id+"\t"+self.name+'\t\t'+self.age+'\t'+self.sex
doom = parse("lianxi.xml")
list1=[]
class DomParse:
def __init__(self):
self.stu=None
def text(self):
students = doom.getElementsByTagName("student")
for stu in students:
self.stu = Student()
self.stu.name=stu.getElementsByTagName("name")[0].childNodes[0].data
self.stu.age=stu.getElementsByTagName("age")[0].childNodes[0].data
self.stu.sex=stu.getElementsByTagName("sex")[0].childNodes[0].data
self.stu.id=stu.getAttribute("id")
list1.append(self.stu)
for i in list1:
print(i)
a=DomParse()
a.text()
通过sax进行解析
from xml.sax import parse
from xml.sax.handler import ContentHandler
class Student:
def __init__(self,id=None,name=None,age=None,sex=None):
self.id=id
self.name=name
self.age=age
self.sex=sex
def __repr__(self):
if len(self.name.encode("utf-8"))<=8:
return self.id+"\t"+self.name+'\t\t'+self.age+'\t'+self.sex
else:
return self.id + "\t" + self.name + '\t' + self.age + '\t' + self.sex
stuList=[]
class SaxParser(ContentHandler):
def __init__(self,name=None):
self.name=name
self.stu=None
# def startDocument(self):
# print("starDocument.....")
# def endDocument(self):
# print("endDocument.....")
def startElement(self, name, attrs):
if name=="student":
self.stu=Student()
self.stu.id=attrs["id"]
self.name=name
def characters(self, content):
if self.name=="name":
self.stu.name=content
elif self.name=="age":
self.stu.age=content
elif self.name=="sex":
self.stu.sex=content
def endElement(self, name):
if name=="student":
stuList.append(self.stu)
self.name=None
parse("lianxi.xml",SaxParser())
for s in stuList:
print(s)
通过ElementTree进行解析
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
class Student:
def __init__(self, id=None, name=None, age=None, sex=None):
self.id = id
self.name = name
self.age = age
self.sex = sex
def __repr__(self):
if len(self.name.encode("utf-8")) <= 8:
return self.id + "\t" + self.name + '\t\t' + self.age + '\t' + self.sex
else:
return self.id + "\t" + self.name + '\t' + self.age + '\t' + self.sex
stuList = []
def text1():
tree = ET.parse("lianxi.xml")
students = tree.findall("student")
for stu in students:
student = Student()
student.id = stu.attrib['id']
children = stu.getchildren()
student.name = children[0].text
student.age = children[1].text
student.sex = children[2].text
stuList.append(student)
text1()
for i in stuList:
print(i)