gpt4 book ai didi

python - 如何将多个部分中的网站的多个属性映射为 scrapy 项目?

转载 作者:太空宇宙 更新时间:2023-11-04 01:08:51 25 4
gpt4 key购买 nike

我对 python 和 scrapy 很陌生。这是我在从亚马逊内的产品收集数据集时遇到的问题的示例代码。

from scrapy.selector import HtmlXPathSelector
from amazoncrawler.items import AmazoncrawlerItem
import scrapy

class startcrawler(scrapy.Spider):
name = "amazone"
allowed_domains = ["www.amazon.co.uk"]

start_urls = [
"http://www.amazon.co.uk/product-reviews/B005KP74BI",
]

def parse(self, response):
hxs = HtmlXPathSelector(response)
item = AmazoncrawlerItem()
reviewText = hxs.xpath('//table[@id="productReviews"]/*/*/*/*/div/div' and '//div[@class="reviewText"]/text()').extract()
ratings = hxs.xpath('//table[@id="productReviews"]/*/*/*/*/div/div' and '//span[contains(@class, "s_star")]/span/text()').extract()

for text in reviewText:
item['comment'] = text
yield item
for rating in ratings:
item['rating'] = rating
yield item

作为 csv 文件的响应:

comment,rating
And they do last quite some time too.,
"Not a lot to say about a pair of 9v batteries, but I've not had any problems with Duracell for this purpose.",
Whilst there are quite a few rechargeable 9v ones around you are better off with these as the rechargeable types are not suggested for use in devices such as this.,
Nearly didnt buy these based on two bad reviews - glad I ignored them. Its the Genuine thing with 4 batteries in the pack sold by amazon themselves.,
"They say you only get what you pay for and I am a firm believer of that and certainly in this case it is without doubt, the price of these batteries however in the high street is quite extortionate, hence this is very good value from Amazon. These batteries outlast normal batteries by at least 5-7 times as I have proved to myself several times as I use batteries for my business to power test meters and I can confirm that if you put a run of the mill relatively cheap battery in some of my meters you will be lucky to get 3 days to a week out of them, that is depending on the use of the meter.",
"I still use cheap batteries but only for the likes of wall clocks and the like that do not have a high power drain and they last a reasonable length of time, sometimes up to 2 years. A classic example of how long a cheap battery last is for example my Gillette Fusion ProGlide powered razor, a cheap battery last about a week, but a Duracell lasts at least 5-6 weeks, as I say you only get what you pay for, highly rated batteries and at this price you cannot loose.",
great value for money and its why my wee town is loosing money as their selling one for the same price.,
Great Value for Duracell batteries. I need new ones for our 4 smoke alarms in our house. We normal go for cheap ones from pound shops but they don't last more then a week. When I came across these on Amazon at this price I brought them straight away. They came as describe no problems with them all in our smoke alarms and all tested and work that's what I brought them for to do and they do the job. Ignore the negative comments previous to stop you buying. There is no problems with these batteries,
"Put these into my smoke alarms, worked fine for 18 months before the alarms started the usual chirping at 3am to let you know the battery was dying. They were replaced, but the old ones still had enough power to run one of our baby's toys more a few more months.",
Good price and good shelf life too.,
"Bought 2 packs of these batteries in March 2014 to use in PIR sensors for a wireless alarm. Batteries in the sensors generally needed to be changed annually. These batteries lasted barely 5 months, very disappointing.",
"Arrived smartly Thanks and as stated fresh cells 2016 expiry, good for my smoke and CO2 alarms, postman had to ring bell as square box shape did not fit through letter box.",
"I purchased these because I needed one for a smoke alarm - but I knew it wouldn't be long before I needed others because all my alarms were purchased at the same time. Sure enough 5 weeks later I had to change another one. When the alarm instantly gave the ""low battery"" beeps I took it out and tested it - it was well down in the ""weak"" section. Was this a factory fault? or do employees swap their flat batteries for a new one in the box? There is no seal on the box to alert anyone to such a fiddle.",
"They're batteries. They fit well the bastard smoke detectors when they start bleeping bleeping away. They still won't shut up with the new batteries, but that's the bastard smoke detector's fault, and not the battery, which works fine.",
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",4.7 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",4.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",1.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",4.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",5.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",5.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",5.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",5.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",5.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",1.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",5.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",1.0 out of 5 stars
"Whoever thought of compulsory smoke detectors, and of their general ""safety"" features, would also benefit from having a batchload of these batteries inserted in him.",5.0 out of 5 stars

我的第一个问题是,爬虫在表 ID“productReview”之外提取 3 条评论评级作为前 3 条评论评级,但当我抓取其他产品时,这是一致的。我可以忽略它,但很高兴知道如何解决这个问题。

其次,我想要的是将整个段落合并为一个,相应的评级由分隔符分隔。

comment,rating
"And they do last quite some time too.
Not a lot to say about a pair of 9v batteries, but I've not had any problems with Duracell for this purpose.
Whilst there are quite a few rechargeable 9v ones around you are better off with these as the rechargeable types are not suggested for use in devices such as this.",4.0 out of 5 stars

最佳答案

遍历表中的评论,在循环中实例化一个项目并yield:

def parse(self, response):
reviews = response.xpath('//table[@id="productReviews"]//td/div')
for review in reviews:
item = AmazoncrawlerItem()
item['comment'] = ' '.join(review.xpath('.//div[@class="reviewText"]/text()').extract())
item['rating'] = review.xpath('.//span[contains(@class, "s_star")]/span/text()').extract()[0]
yield item

输出:

{
'comment': u"And they do last quite some time too. Not a lot to say about a pair of 9v batteries, but I've not had any problems with Duracell for this purpose. Whilst there are quite a few rechargeable 9v ones around you are better off with these as the rechargeable types are not suggested for use in devices such as this.",
'rating': u'4.0 out of 5 stars'
}
...

关于python - 如何将多个部分中的网站的多个属性映射为 scrapy 项目?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28924377/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com