/ 从根标签开始必须具有严格的父子关系
// 从当前标签后续节点含有即可选出

列题

import l .html

test_data = \"\"\"
        <div>
            <ul>
                 <li class=\"item-0\"><a href=\" 1.html\" id=\"places_neighbours__row\">9,596,960first item</a></li>
                 <li class=\"item-1\"><a href=\" 2.html\">second item</a></li>
                 <li class=\"item-inactive\"><a href=\" 3.html\">third item</a></li>
                 <li class=\"item-1\"><a href=\" 4.html\" id=\"places_neighbours__row\">fourth item</a></li>
                 <li class=\"item-0\"><a href=\" 5.html\">fifth item</a></li>
                 <li class=\"good-0\"><a href=\" 5.html\">fifth item</a></li>
             </ul>
             <book>
                    <  lang=\"aaengbb\">111111</ >
                    <price id=\"places_neighbours__row\">29.99</price>
            </book>
            <book>
                <  lang=\"zh\">222222</ >
                <price>39.95</price>
            </book>
            <book>
                < >33333</ >
                <price>40</price>
            </book>
         </div>
        <a>
            <book>
                < >123</ >
            </book>

        </a>
        
        \"\"\"

\"\"\"

* 通配符，选择所有
//div/book[1]/  选择div下第一个book标签的 元素
//div/book/ [@lang=\"zh\"]选择 属性含有lang且内容是zh的 元素
//div/book/  //book/  //  //div//  具有相同的结果，因为使用相对路径最终都指向 
//book/ /@* 将 所有的属性值选择出来
//book/ /text() 将 的内容选择出来，使用内置text()函数
//a[@href=\" 1.html\" and @id=\"places_neighbours__row\"] 
//a[@href=\" 1.html\" or @id=\"places_neighbours__row\"]
//div/book[last()]/ /text() 将最后一个book元素选出
//div/book[price > 39]/  将book子标签price数值大于39的选择出来
//li[starts-with(@class,\'item\')] 将class属性前缀是item的li标签选出
// [contains(@lang,\'eng\')] 将 属性lang含有eng关键字的标签选出
\"\"\"


html = l .html.fromstring(test_data)

#html_data = html.xpath(\'//div/book/ /text()\')
#html_data = html.xpath(\'//div/book[1]/ /text()\')
#html_data = html.xpath(\'//div/book/ [@lang=\"zh\"]/text()\')
#html_data = html.xpath(\'//div/book/ /text()\')
# html_data = html.xpath(\'//book/ /text()\')
# html_data = html.xpath(\'// /text()\')
# html_data = html.xpath(\'//div// /text()\')
# html_data = html.xpath(\'//book/ /@*\')

# html_data = html.xpath(\'//a[@href=\" 1.html\" and @id=\"places_neighbours__row\"]/text()\')
#html_data = html.xpath(\'//a[@href=\" 2.html\"]/text()\')
# html_data = html.xpath(\'//div/ul/li/a[@id]/text()\')
# html_data = html.xpath(\'//a[@href=\" 1.html\" and @id=\"places_neighbours__row\"]/@*\')
# html_data = html.xpath(\'//a[@href=\" 1.html\" and @id=\"places_neighbours__row\"]/@href\')
# html_data = html.xpath(\'//a[@href=\" 1.html\" or @id=\"places_neighbours__row\"]/text()\')
# html_data = html.xpath(\'//div/book[last()]/ /text()\')
#html_data = html.xpath(\'//div/book[price > 39]/ /text()\')
# html_data = html.xpath(\'//li[starts-with(@class,\"item\")]/a/text()\')
html_data = html.xpath(\'// [contains(@lang,\"eng\")]/text()\')

for i in html_data:
    print(i)

爬虫-xpath 匹配

浏览：1452 2026-05-09

列题

继续阅读与本文标签相同的文章

安防装置微笑互动

SimpleDateFormat.format的简单使用小结

特别推荐 2026年05月18日星期一

精彩发现

热门标签

爬虫-xpath 匹配

浏览：1452 2026-05-09

列题

继续阅读与本文标签相同的文章

2026-05-18栏目： 教程

2026-05-18栏目： 教程

2026-05-18栏目： 教程

2026-05-18栏目： 教程

2026-05-18栏目： 教程

2026-04-23栏目： 教程

2026-04-23栏目： 教程

2026-04-23栏目： 教程

2026-04-23栏目： 教程

2026-04-24栏目： 教程

特别推荐 2026年05月18日 星期一

精彩发现

热门标签

相关文章

2026-05-18栏目：教程

2026-05-18栏目：教程

2026-05-18栏目：教程

2026-05-18栏目：教程

2026-05-18栏目：教程

2026-04-23栏目：教程

2026-04-23栏目：教程

2026-04-23栏目：教程

2026-04-23栏目：教程

2026-04-24栏目：教程

特别推荐 2026年05月18日星期一