gpt4 book ai didi

python - 为什么我的 XPath 无法选择文本?

转载 作者:行者123 更新时间:2023-12-04 08:07:37 24 4
gpt4 key购买 nike

如果没有节点,如何访问 XPath 中的文本?
文本在引号中并位于另一个节点内的单独行上
我在 XPath 中选择正确的元素时遇到问题

 <span>
<a href="www.imagine_a_link_here.org">
"
This is the text I need to access
"
</a>
</span>
我通常会通过写作来做到这一点
import requests
from lxml import html,etree
from lxml.html import document_fromstring

page = requests.get('https://www.the_link_im_trying_to_webscrape.org')
tree = html.fromstring(page.content)
the_text_i_need_to_access_xpath = '/span/a/text()'
the_text_i_need_to_access = tree.xpath(the_text_i_need_to_access_xpath)
不幸的是,这仅返回一个空列表。有谁知道我必须如何修改 XPath 才能获得我正在寻找的字符串?

最佳答案

How do you access a text in an XPath if it doesn't have a node?


XML 或 HTML 文档中的文本将与节点相关联。这不是这里的问题。和 " "分隔符只是为了向您展示周围的空白。
如上所示,您的 XPath 应该选择 a 中的文本。元素。以下是一些可能不会发生的原因:
  • 作为@MadsHansen mentioned in comments ,实际 HTML 的根元素可能不是 span如图所示。看:
  • Difference between "//" and "/" in XPath?

  • 文本可能在您执行 XPath 时未加载,因为文档尚未完全加载或因为 JavaScript 稍后动态更改 DOM。看:
  • Selenium wait until document is ready
  • Selenium WebDriver: Wait for complex page with JavaScript to load

  • fromstring() 可以使用比预期更多的魔法:

  • fromstring(string):Returns document_fromstring or fragment_fromstring, based onwhether the string looks like a full document, or just a fragment.


    鉴于此,这是对您的代码的更新,它将按预期选择目标文本:
    import requests
    from lxml import html
    from lxml.html import document_fromstring

    htmlstr = """
    <span>
    <a href="www.imagine_a_link_here.org">
    "
    This is the text I need to access
    "
    </a>
    </span>
    """

    tree = html.fromstring(htmlstr)
    print(html.tostring(tree))
    the_text_i_need_to_access_xpath = '//span/a/text()'
    the_text_i_need_to_access = tree.xpath(the_text_i_need_to_access_xpath)
    print(the_text_i_need_to_access)
    或者,如果您不需要/想要 HTML 惊喜,这也会选择文本:
    import lxml.etree as ET

    xmlstr = """
    <span>
    <a href="www.imagine_a_link_here.org">
    "
    This is the text I need to access
    "
    </a>
    </span>
    """

    root = ET.fromstring(xmlstr)
    print(root.xpath('/span/a/text()'))
    信用:感谢 @ThomasWeller指出额外的并发症并帮助解决它们。

    关于python - 为什么我的 XPath 无法选择文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66145117/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com