gpt4 book ai didi

python代码错误(linux,网络抓取)奇怪的错误

转载 作者:太空宇宙 更新时间:2023-11-04 12:34:22 24 4
gpt4 key购买 nike

我正在尝试从 NYT(纽约时报)文章中获取一些数据,当我执行以下代码时,它给了我一个我不熟悉的错误,我在谷歌上搜索并查看了 stackoverflow 之前的答案,但不明白我的问题。谁能告诉我如何解决我的错误。提前致谢!

代码:

from nytimesarticle import articleAPI
api = articleAPI('a0de895aa110431eb2344303c7105a9f')

articles = api.search( q = 'Obama',
fq = {'headline':'Obama', 'source':['Reuters','AP', 'The New York Times']},
begin_date = 20111231 )

def parse_articles(articles):
'''
This function takes in a response to the NYT api and parses
the articles into a list of dictionaries
'''
news = []
for i in articles['response']['docs']:
dic = {}
dic['id'] = i['_id']
if i['abstract'] is not None:
dic['abstract'] = i['abstract'].encode("utf8")
dic['headline'] = i['headline']['main'].encode("utf8")
dic['desk'] = i['news_desk']
dic['date'] = i['pub_date'][0:10] # cutting time of day.
dic['section'] = i['section_name']
if i['snippet'] is not None:
dic['snippet'] = i['snippet'].encode("utf8")
dic['source'] = i['source']
dic['type'] = i['type_of_material']
dic['url'] = i['web_url']
dic['word_count'] = i['word_count']
# locations
locations = []
for x in range(0,len(i['keywords'])):
if 'glocations' in i['keywords'][x]['name']:
locations.append(i['keywords'][x]['value'])
dic['locations'] = locations
# subject
subjects = []
for x in range(0,len(i['keywords'])):
if 'subject' in i['keywords'][x]['name']:
subjects.append(i['keywords'][x]['value'])
dic['subjects'] = subjects
news.append(dic)
return(news)

def get_articles(date,query):
'''
This function accepts a year in string format (e.g.'1980')
and a query (e.g.'Amnesty International') and it will
return a list of parsed articles (in dictionaries)
for that year.
'''
all_articles = []
for i in range(0,100): #NYT limits pager to first 100 pages. But rarely will you find over 100 pages of results anyway.
articles = api.search(q = query,
fq = {'source':['Reuters','AP', 'The New York Times']},
begin_date = date + '0101',
end_date = date + '1231',
sort='oldest',
page = str(i))
articles = parse_articles(articles)
all_articles = all_articles + articles
return(all_articles)

Amnesty_all = []
for i in range(1980,2014):
print 'Processing' + str(i) + '...'
Amnesty_year = get_articles(str(i),'Amnesty International')
Amnesty_all = Amnesty_all + Amnesty_year

import csv
keys = Amnesty_all[0].keys()
with open('amnesty-mentions.csv', 'wb') as output_file:
dict_writer = csv.DictWriter(output_file, keys)
dict_writer.writeheader()
dict_writer.writerows(Amnesty_all)

在终端运行时产生的错误:

rakesh-chinta@rakeshchinta-VirtualBox:~$ cd Desktop
rakesh-chinta@rakeshchinta-VirtualBox:~/Desktop$ python nyt.py
Processing1980...
Traceback (most recent call last):
File "nyt.py", line 66, in <module>
Amnesty_year = get_articles(str(i),'Amnesty International')
File "nyt.py", line 59, in get_articles
articles = parse_articles(articles)
File "nyt.py", line 14, in parse_articles
for i in articles['response']['docs']:
KeyError: 'response'
rakesh-chinta@rakeshchinta-VirtualBox:~/Desktop$ python nyt.py
Processing1980...
Traceback (most recent call last):
File "nyt.py", line 66, in <module>
Amnesty_year = get_articles(str(i),'Amnesty International')
File "nyt.py", line 59, in get_articles
articles = parse_articles(articles)
File "nyt.py", line 14, in parse_articles
for i in articles['response']['docs']:
KeyError: 'response'

最佳答案

api.search 返回的结果不是预期的。它的代码:

    r = requests.get(url)
return r.json()

因此,只有当 api“http://api.nytimes.com/svc/search/v2/articlesearch”返回正确的响应并且响应具有正确的 json 主体时,您的代码才能正确运行。

异常是KeyError,所以返回对象是dict like。你可能想检查:

In [8]: print articles.keys()
Out[8]: [u'status', u'response', u'copyright']

和:

In [9]: print articles['status']
Out[9]: u'OK'

如果不是,我猜当 articles['status'] != 'OK' 时 nyt api 可能不会填充响应,您可能需要处理这个意外状态并重试。

关于python代码错误(linux,网络抓取)奇怪的错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42382938/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com