gpt4 book ai didi

python - 新闻 API - 将输出输出到 Pandas DataFrame

转载 作者:行者123 更新时间:2023-11-28 17:01:15 27 4
gpt4 key购买 nike

我已成功调用新闻 API 并将结果放入 DataFrame,但仅限于第 1 页。

def get_articles(keyword):

all_articles = newsapi.get_everything(q=keyword, sources='abc-news-au, news-com-au',
domains='http://www.abc.net.au/news, http://www.news.com.au',
from_param='2018-12-28',
to='2019-01-28',
language='en',
sort_by='popularity',
page=1)

all_articles = pd.DataFrame(all_articles)
all_articles = pd.concat([all_articles.drop(['articles'], axis=1), all_articles['articles'].apply(pd.Series)], axis=1)

return all_articles

enter image description here

它给了我想要的数据框,但是,当我尝试循环浏览以下页面时,我变得不卡住了。

我试过以下方法

empty_list = []

for i in range(1,4,1):
all_articles = all_articles = newsapi.get_everything(q=keyword, sources='abc-news-au, news-com-au',
domains='http://www.abc.net.au/news, http://www.news.com.au',
from_param='2018-12-28',
to='2019-01-28',
language='en',
sort_by='popularity',
page=i)
empty_list.append(all_articles)

这将返回所有文章,但它是一个存储在列表中的字典。

[{'articles': [{'author': None,
'content': 'Updated \r\nJanuary 14, 2019 14:33:00\r\nANZ customers have lost access to banking services at their local post offices after the bank failed to reach an agreement with Australia Post on their Bank@Post service.\r\nThe change, which came into effect last night, wil… [+5084 chars]',
'description': 'ANZ customers can no longer utilise banking services at their local post offices after the bank failed to reach an agreement with Australia Post on their Bank@Post service.',
'publishedAt': '2019-01-14T03:14:57Z',
'source': {'id': 'abc-news-au', 'name': 'ABC News (AU)'},
'title': "ANZ customers 'furious' as access to Bank@Post cancelled",
'url': 'https://www.abc.net.au/news/2019-01-14/anz-customers-lose-banking-service-at-australia-post/10713156',
'urlToImage': 'https://www.abc.net.au/news/image/10710052-16x9-700x394.jpg'},
{'author': 'Stephen Letts',
'content': "Posted \r\nJanuary 26, 2019 06:20:15\r\nIf you think AMP's glum market update of an additional $200 million worth of costs to fix its various scandals rules a line under the sordid and sorry mess, think again.\r\nKey points:\r\nRemediation costs for Australia's scand… [+5019 chars]",
'description': "Australia's six big wealth managers currently have provisions for about $2.6 billion to fix the scandals that have emerged from the banking royal commission. That could be be woefully inadequate.",
'publishedAt': '2019-01-25T19:20:15Z',
'source': {'id': 'abc-news-au', 'name': 'ABC News (AU)'},
'title': "Wealth managers' remediation costs set to soar",
'url': 'https://www.abc.net.au/news/2019-01-26/wealth-manager-remediation-costs-set-to-soar/10749810',
'urlToImage': 'https://www.abc.net.au/news/image/1147126-16x9-700x394.jpg'}]

以前,它只是一个字典[无列表]。

当我进行一些转换时[与上面类似],我得到以下 DataFrame

enter image description here

问题:

  1. 有人知道更好的方法吗?
  2. 如果您要使用当前数据框,您将如何从每一列中提取字典并呈现它以使其看起来像第一个数据框?

如有任何帮助,我们将不胜感激。

PS:如果你想复制,你可以复制我的代码 - 你只需要从以下位置获取你自己的 API key :https://newsapi.org/docs/client-libraries/python

最佳答案

看起来您想提取文章值并扩展而不是附加:

articles = []

for i in range(1,4,1):
articles_page = newsapi.get_everything(
q=keyword,
sources='abc-news-au, news-com-au',
domains='http://www.abc.net.au/news, http://www.news.com.au',
from_param='2018-12-28',
to='2019-01-28',
language='en',
sort_by='popularity',
page=i)
articles.extend(articles_page['articles'])

# outside of the loop, create the DataFrame
pd.DataFrame(articles)

关于python - 新闻 API - 将输出输出到 Pandas DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54394323/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com