[python] xml.etree.ElementTree使用方法小结

小编 2026-06-17 阅读:1810 评论:0

特别注明：本文所使用的例子均来自于Python软件内置文档中“20.5. xml.etree.ElementTree — The ElementTree XML API”一节。思考1：ElementTr...

特别注明：本文所使用的例子均来自于Python软件内置文档中“20.5. xml.etree.ElementTree — The ElementTree XML API”一节。

思考1：ElementTree class和Element class有何区别？

20.5.1.1节的首段就对这两个类做了说明，其实我先开始并没有特别注意到这两个类，但是在使用中发现竟然会有两个类，脑海中突然冒出一个问题——ElementTree class和Element class有何区别？

\"ET has two classes for this purpose - ElementTree represents the whole XML document as a tree, and Element represents a single node in this tree. Interactions with the whole document (reading and writing to/from files) are usually done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.\"

挺佩服写文章的作者，将这个问题的答案写在了第一段，我猜测他在使用xml.etree.ElementTree的时候也曾经遇到过这个问题，这是一个最基本的问题，当然也是我们首先要了解的一个知识点。

思考2：使用xml.etree.ElementTree.fromstring()方法的时候遇到一个看似简单却很容易出错的问题

字符串中保存着一组xml代码，可以直接使用fromstring(text)。特别要注意的是，三引号（\'\'\'）后面要紧跟跟xml代码，不能有回车。

错误用法(三引号后面有回车)

# 这是错误用法!!!
country_data_as_string = \'\'\'
<?xml version=\"1.0\"?>
<data>
    <country name=\"Liechtenstein\">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name=\"Austria\" direction=\"E\"/>
        <neighbor name=\"Switzerland\" direction=\"W\"/>
    </country>
    <country name=\"Singapore\">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name=\"Malaysia\" direction=\"N\"/>
    </country>
    <country name=\"Panama\">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name=\"Costa Rica\" direction=\"W\"/>
        <neighbor name=\"Colombia\" direction=\"E\"/>
    </country>
</data>
\'\'\'
import xml.etree.ElementTree as ET
root = ET.fromstring(country_data_as_string)

此时会有如下报错提示，说明得很清楚，“XML or text declaration not at start of entity”：

Traceback (most recent call last):
  File \"<pyshell#47>\", line 1, in <module>
    root = ET.fromstring(country_data_as_string)
  File \"C:\\Users\\cherish\\AppData\\Local\\Programs\\Python\\Python36-32\\lib\\xml\\etree\\ElementTree.py\", line 1314, in XML
    parser.feed(text)
  File \"<string>\", line None
xml.etree.ElementTree.ParseError: XML or text declaration not at start of entity: line 2, column 0

正确用法（三引号后紧跟xml代码）

# 这是正确用法!!!
country_data_as_string = \'\'\'<?xml version=\"1.0\"?>
<data>
    <country name=\"Liechtenstein\">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name=\"Austria\" direction=\"E\"/>
        <neighbor name=\"Switzerland\" direction=\"W\"/>
    </country>
    <country name=\"Singapore\">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name=\"Malaysia\" direction=\"N\"/>
    </country>
    <country name=\"Panama\">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name=\"Costa Rica\" direction=\"W\"/>
        <neighbor name=\"Colombia\" direction=\"E\"/>
    </country>
</data>
\'\'\'
import xml.etree.ElementTree as ET
root = ET.fromstring(country_data_as_string)

思考3：Element Objects中iter(tag=None)方法会遍历当前节点下的所有子节点、孙节点等等

在Element Objects所有方法中，只有iter()是从当前节点开始遍历其下的所有子节点、孙节点、重孙节点等等一直进行下去。诸如find()、findall()、甚至是iterfind()都只能遍历子节点(subelement)。正如文档中对iter()方法的描述中最重要的一句——The iterator iterates over this element and all elements below it, in document (depth first) order。

另外，20.5.1.4. Finding interesting elements一节也说明了iter()与findall()的区别。

Element has some useful methods that help iterate recursively over all the sub-tree below it (its children, their children, and so on). For example, Element.iter()。

Element.findall() finds only elements with a tag which are direct children of the current element. Element.find() finds the first child with a particular tag

思考4：拿到一个xml文档后如何快速处理数据

主要是两步：

1. 从xml文档中读取数据——使用parse()方法，此时返回ElementTree对象

2. 获取根节点——使用getroot()方法，此时返回Element对象

import xml.etree.ElementTree as ET
tree = ET.parse(\'country_data.xml\')  # step1
root = tree.getroot()                # step2

假设country_data.xml文档中的内容如下，执行完第二步后root就指向data节点，然后就可以根据自己的需求进行数据处理了。

<?xml version=\"1.0\"?>
<data>
    <country name=\"Liechtenstein\">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name=\"Austria\" direction=\"E\"/>
        <neighbor name=\"Switzerland\" direction=\"W\"/>
    </country>
    <country name=\"Singapore\">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name=\"Malaysia\" direction=\"N\"/>
    </country>
    <country name=\"Panama\">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name=\"Costa Rica\" direction=\"W\"/>
        <neighbor name=\"Colombia\" direction=\"E\"/>
    </country>
</data>

版权声明

本文仅代表作者观点，不代表百度立场。
本文系作者授权百度百家发表，未经许可，不得转载。

上一篇：单例模式（1） 下一篇：工厂方法模式(2)

[python] xml.etree.ElementTree使用方法小结

思考1：ElementTree class和Element class有何区别？

思考2：使用xml.etree.ElementTree.fromstring()方法的时候遇到一个看似简单却很容易出错的问题

错误用法(三引号后面有回车)

正确用法（三引号后紧跟xml代码）

思考3：Element Objects中iter(tag=None)方法会遍历当前节点下的所有子节点、孙节点等等

思考4：拿到一个xml文档后如何快速处理数据

版权声明

热门文章

机房智能化温湿度解决方式之POE供电以太网温湿度传感器

Sequential Monte Carlo Methods (SMC) 序列蒙特卡洛/粒子滤波/Bootstrap Filtering

HTTP状态保持的原理

Hive 系统函数及示例

CSRF的原理和防范措施

最近发表

标签列表

[python] xml.etree.ElementTree使用方法小结

思考1：ElementTree class和Element class有何区别？

思考2：使用xml.etree.ElementTree.fromstring()方法的时候遇到一个看似简单却很容易出错的问题

错误用法(三引号后面有回车)

正确用法（三引号后紧跟xml代码）

思考3：Element Objects中iter(tag=None)方法会遍历当前节点下的所有子节点、孙节点等等

思考4：拿到一个xml文档后如何快速处理数据

版权声明

相关阅读

Java架构学习(四十九)会员登录与注册项目回顾&发送邮件功能实现&移动APP端token登录实现

AJAX入门这一篇就够了

监听器入门看这篇就够了

golang database sql DSN (Data Source Name)中的timeout, readTimeout

JSP第三篇【JavaBean的介绍、JSP的行为--JavaBean】

国外Python黑客技术，诱骗玩的真好，Dnspwn攻击实战教程！

热门文章

机房智能化温湿度解决方式之POE供电以太网温湿度传感器

Sequential Monte Carlo Methods (SMC) 序列蒙特卡洛/粒子滤波/Bootstrap Filtering

HTTP状态保持的原理

Hive 系统函数及示例

CSRF的原理和防范措施

最近发表

标签列表