gpt4 book ai didi

python - 为什么我在 scrapy 上的所有项目都是一样的?

转载 作者:太空宇宙 更新时间:2023-11-03 17:21:29 25 4
gpt4 key购买 nike

我是 Scrapy 编程新手,我遇到了一个问题。有一个网站,我想为表格的每个元素创建一个唯一的项目,但每个项目都是相同的,我不知道为什么,这是我的代码:

from scrapy import Spider
from scrapy.selector import Selector

from petroleo.items import PetroleoItem


class PetroleoSpider(Spider):
name = "petroleo"
site = "http://www.glossary.oilfield.slb.com/"
allowed_domains = [site]
start_urls = [site + 'en/Terms.aspx?filter=sym&LookIn=term%20name&searchtype=starts%20with',]

def parse(self, response):

words = Selector(response).xpath("//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td")

for word in words:
item = PetroleoItem()

if word.xpath("//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/em").extract():

item['title'] = word.xpath(
"//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/em/text()").extract()[0]
item['title'] += word.xpath(
"//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/sub/text()").extract()[0]


if word.xpath("//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/i").extract():
item['title'] = {'en': word.xpath(
"//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/i/text()").extract()}
item['title']['en'][0] += word.xpath(
"//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/i/sub/text()").extract()[0]

if word.xpath("//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/text()").extract():
item['title'] = {'en': word.xpath(
"//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/text()").extract()}

yield item

最佳答案

通过在前面添加一个点来使表达式与上下文相关,并且不要重复 //table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td 部分:

words = response.xpath("//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td")

for word in words:
item = PetroleoItem()

if word.xpath("./a/em").extract():
item['title'] = word.xpath("./a/em/text()").extract()[0]
item['title'] += word.xpath("./a/sub/text()").extract()[0]

if word.xpath("./a/i").extract():
item['title'] = {'en': word.xpath("./a/i/text()").extract()}
item['title']['en'][0] += word.xpath("./a/i/sub/text()").extract()[0]

if word.xpath("./a/text()").extract():
item['title'] = {'en': word.xpath("./a/text()").extract()}

yield item

我不是特别喜欢和理解你想在循环中做什么,但这至少应该解决你在问题中描述的问题。

关于python - 为什么我在 scrapy 上的所有项目都是一样的?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33093412/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com