gpt4 book ai didi

python - 在 Python 3.6 中 - 使用 XPath 表达式获取文本

转载 作者:太空宇宙 更新时间:2023-11-03 21:36:50 26 4
gpt4 key购买 nike

<div class = "card-block cms>
<p>and then have a tea or coffee on the balcony of the cafeteria.</p>
<p>&nbsp;</p>
</div>

我正在尝试检查我抓取的网站文本是否包含

texts = driver.find_element_by_xpath("//div[@class='card-block cms']")
textInDivTag = texts.text
print(textInDivTag)
if u"\xa0" in textInDivTag:
print("yes")

我的输出如下:

and then have a tea or coffee on the balcony of the cafeteria.

如您所见,它无法识别不间断空格。

最佳答案

字符已被识别,但正在转换为普通空格 (u"\x20")。

根据comment in the Java Selenium sourcecode , .text/.getText() 返回可见文本,并引用 W3C webdriver specification ,“11.3.5 获取元素文本”部分(重点是我添加的):

The Get Element Text command intends to return an element’s text “asrendered”. An element’s rendered text is also used for locating aelements by their link text and partial link text.

One of the major inputs to this specification was the open sourceSelenium project. This was in wide-spread use before thisspecification written, and so had set user expectations of how the GetElement Text command should work. As such, the approach presented hereis known to be flawed, but provides the best compatibility withexisting users.

因此,这种行为可能符合规范,但我还找不到专门用常规空格替换不间断空格的源代码。我也无法在 Selenium 存储库中找到问题,但也许您可以打开一个来尝试一下。

关于python - 在 Python 3.6 中 - 使用 XPath 表达式获取文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53195177/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com