gpt4 book ai didi

python - Scrapy 中的列表索引超出范围错误

转载 作者:太空宇宙 更新时间:2023-11-04 10:14:51 26 4
gpt4 key购买 nike

我正在尝试从微博获取用户状态,但一直出现此错误。

import re
import string
import sys
import os
import urllib
import urllib2
from bs4 import BeautifulSoup
import requests
from lxml import etree

reload(sys)
sys.setdefaultencoding('utf-8')
if(len(sys.argv)>=2):
user_id = (int)(sys.argv[1])
else:
user_id = (int)(raw_input("input user_id: "))

cookie = {"Cookie": "******my cookies"}
url = 'http://weibo.cn/u/%d?filter=1&page=1'%user_id

html = requests.get(url, cookies = cookie).content
selector = etree.HTML(html)
pageNum = (int)(selector.xpath('//input[@name="mp"]')[0].attrib['value'])

result = ""
urllist_set = set()
word_count = 1
image_count = 1

print 'spider is ready...'

for page in range(1,pageNum+1):

url = 'http://weibo.cn/u/%d?filter=1&page=%d'%(user_id,page)
lxml = requests.get(url, cookies = cookie).content


selector = etree.HTML(lxml)
content = selector.xpath('//span[@class="ctt"]')
for each in content:
text = each.xpath('string(.)')
if word_count>=4:
text = "%d :"%(word_count-3) +text+"\n\n"
else :
text = text+"\n\n"
result = result + text
word_count += 1


fo = open("/Users/apple/Desktop/%s"%user_id, "wb")
fo.write(result)
word_path=os.getcwd()+'/%d'%user_id
print 'done'

错误:

File "weibo_spider.py", line 25, in <module>
pageNum = (int)(selector.xpath('//input[@name="mp"]')[0].attrib['value'])
IndexError: list index out of range

最佳答案

您假设 selector.path 总能找到一些东西,但大多数情况下并非如此。所以养成防御性编程的习惯。参见 Defensive Programming

尝试替换

pageNum = (int)(selector.xpath('//input[@name="mp"]')[0].attrib['value'])

与:

controls = selector.xpath('//input[@name="mp"]')
if controls:
pageNum = int(controls[0].attrib['value'])

关于python - Scrapy 中的列表索引超出范围错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35978022/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com