gpt4 book ai didi

python - Scrapy:如何提取嵌套 div 中的内容(xpath 选择器)?

转载 作者:太空宇宙 更新时间:2023-11-04 02:53:09 24 4
gpt4 key购买 nike

请看下面的 html 标记。如何使用 Scrapy 中的 xpath 选择器从 div 中的 col-sm-7 类名中提取内容?

我想提取这段文字:

Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE

HTML:

<div class="pricing panel panel-primary">
<div class="panel-heading">Infortrend Products</div>
<div class="body">
<div class="panel-subheading"><strong>EonNAS Pro Models</strong></div>
<div class="row">
<div class="col-sm-7"><strong>Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE</strong><br />
<small>Intel Core i3 Dual-Core 3.3GHz Processor, 8GB DDR3 RAM (Drives Not Included)</small></div>
<div class="col-sm-3">#ENP8502MD-0030<br />
<strong> Our Price: $2,873.00</strong></div>
<div class="col-sm-2">
<form action="/addcart.asp" method="get">
<input type="hidden" name="item" value="ENP8502MD-0030 - Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE (Drives Not Included)">
<input type="hidden" name="price" value="$2873.00">
<input type="hidden" name="custID" value="">
<input type="hidden" name="quantity" value="1">
<button type="submit" class="btn btn-primary center-block"><i class="fa fa-shopping-cart"></i> Add to Cart</button>
</form>
</div>
</div>
</div>
</div>

我尝试使用这个命令,但没有成功:

response.xpath('//div[@class="pricing panel panel-primary"]/div[@class="panel-heading"]/text()/div[@class="body"]//div[@class="panel-subheading" and contains(@style,'font-weight:bold')]/text()').extract_first()

最佳答案

您可以在 <strong> 之间获取文本元素,像这样:

print(response.xpath('//div[@class="col-sm-7"]//text()').extract()[0].strip())

print(response.xpath('//div[@class="col-sm-7"]/strong/text()').extract()[0].strip())

以上两个语句都会导致:

Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE

您可以使用 //text() 获取此 div 内所有元素之间的文本包括在 <strong> 内和 <small>元素内的标签,像这样:

elem_text = ' '.join([txt.strip() for txt in response.xpath('//div[@class="col-sm-7"]//text()').extract()])
print(elem_text)

这将导致:

Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE  Intel Core i3 Dual-Core 3.3GHz Processor, 8GB DDR3 RAM (Drives Not Included)

关于python - Scrapy:如何提取嵌套 div 中的内容(xpath 选择器)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43173253/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com