gpt4 book ai didi

python - 尝试使用 scrapy "Mobilya"通过 xpath 从 html 获取文本

转载 作者:行者123 更新时间:2023-12-02 01:27:05 26 4
gpt4 key购买 nike

下面是 HTML,我正在处理,我正在尝试获取“Gardroplar”文本,但它返回空

<ol class="nav align-items-center flex-nowrap text-nowrap overflow-auto hide-scrollbar> 开头

<li>
<a href="/">
<svg class="icon-home m-0">
<use
xlink:href="/_ui/responsive/theme-alpha/images/icons.svg#icon-home"
></use>
</svg>
</a>
</li>
<li>
<svg class="icon-arrow2 m-0">
<use
xlink:href="/_ui/responsive/theme-alpha/images/icons.svg#icon-arrow1"
></use>
</svg>
<a href="/mobilya/c/109">Mobilya</a>
</li>
<li>
<svg class="icon-arrow1 m-0">
<use
xlink:href="/_ui/responsive/theme-alpha/images/icons.svg#icon-arrow1"
></use>
</svg>
<span class="top-breadcrumb">
<a
class="d-inline-flex align-items-center border pl-10 rounded-sm"
href="/mobilya/gardiroplar/c/109011"
data-toggle="dropdown"
aria-expanded="false"
>Gardıroplar<svg class="icon-arrow7 m-0 rotate-top">
<use
xlink:href="/_ui/responsive/theme-alpha/images/icons.svg#icon-arrow7"
></use>
</svg>
</a>
<ul class="dropdown-menu px-15 py-0 border-0 text-c2">
<li class="border-bottom py-10">
<a
class="d-flex align-items-center justify-content-between pl-5 py-5 reverse font-weight-bold"
href="/mobilya/gardiroplar/c/109011"
>Gardıroplar</a
>
</li>

<li class="px-10 border-bottom">
<a
class="d-flex align-items-center justify-content-between pl-5 py-5 reverse"
href="/gardiroplar/kapakli-gardiroplar/c/109011002"
>Kapaklı Gardıroplar<svg class="icon-arrow1 ml-5">
<use
xlink:href="/_ui/responsive/theme-alpha/images/icons.svg#icon-arrow1"
></use>
</svg>
</a>
</li>
<li class="px-10 border-bottom">
<a
class="d-flex align-items-center justify-content-between pl-5 py-5 reverse"
href="/gardiroplar/surgulu-gardiroplar/c/109011003"
>Sürgülü Gardıroplar<svg class="icon-arrow1 ml-5">
<use
xlink:href="/_ui/responsive/theme-alpha/images/icons.svg#icon-arrow1"
></use>
</svg>
</a>
</li>
<li class="px-10 border-bottom">
<a
class="d-flex align-items-center justify-content-between pl-5 py-5 reverse"
href="/gardiroplar/bez-dolaplar/c/109011001"
>Bez Dolaplar<svg class="icon-arrow1 ml-5">
<use
xlink:href="/_ui/responsive/theme-alpha/images/icons.svg#icon-arrow1"
></use>
</svg>
</a>
</li>
</ul>
</span>
</li>
<li>
<svg class="icon-arrow1 m-0">
<use
xlink:href="/_ui/responsive/theme-alpha/images/icons.svg#icon-arrow1"
></use>
</svg>
<a href="/gardiroplar/kapakli-gardiroplar/c/109011002">Kapaklı Gardıroplar</a>
</li>
</ol>

我的代码:

response.xpath('//ol[@class="nav.align-items-center.flex-nowrap.text-nowrap.overflow-auto.hide-scrollbar.tab-title"]//li[svg[contains(@class,"icon-arrow2")]]/text()').getall()

最佳答案

如果 icon-arrow2 类是固定值,您可以使用以下 XPath:

"//li[./*[contains(@class,'icon-arrow2')]]//a"

完整的命令是

response.xpath("//li[./*[contains(@class,'icon-arrow2')]]//a/text()").getall()

UPD
在您分享该页面的实际链接后,我可以为您提供更好的定位器。
这将起作用:

response.xpath("//li[./*[contains(@class,'icon-arrow')]]/a[contains(@href,'mob')]/text()").getall()

UPD2
这将为您提供 Kapaklı Gardıroplar 文本:

response.xpath("//li[./*[contains(@class,'icon-arrow')]]/a[contains(@href,'gar')]/text()").getall()

UPD3
这将为您提供您定义的 Gardıroplar 文本:

response.xpath("//li[@class='border-bottom py-10']/a/text()").getall()

关于python - 尝试使用 scrapy "Mobilya"通过 xpath 从 html 获取文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74231883/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com