gpt4 book ai didi

python - scrapy输出标题和相关链接

转载 作者:太空宇宙 更新时间:2023-11-03 18:50:21 27 4
gpt4 key购买 nike

我的 scrapy 蜘蛛向我显示所有网页的标题。请告诉我如何显示标题和与该标题相关的链接?我要解析this页。我的代码:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from probe1.items import SpiderItem

class SpiderSpider(CrawlSpider):
name = "spider"
allowed_domains = ["WEB_PAGE"]
start_urls = [
"http://www.WEB_PAGE"
]

rules = (
Rule(
SgmlLinkExtractor(allow_domains=("WEB_PAGE",)),
callback='parse_page', follow=True
),
)


def parse_page(self, response):
hxs = HtmlXPathSelector(response)
print hxs
sites = hxs.select('//title')
items = []
for s in sites:
item = SpiderItem()
item['title'] = s.select('//title').extract
items.append(item)
return items

最佳答案

response.url包含您需要的内容:

url

A string containing the URL of the response.

关于python - scrapy输出标题和相关链接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18510511/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com