- android - 多次调用 OnPrimaryClipChangedListener
- android - 无法更新 RecyclerView 中的 TextView 字段
- android.database.CursorIndexOutOfBoundsException : Index 0 requested, 光标大小为 0
- android - 使用 AppCompat 时,我们是否需要明确指定其 UI 组件(Spinner、EditText)颜色
我的要求是从网页中获取标题、图像和摘要。
我能够从主页获取标题和摘要。但是,要获取摘要,我需要抓取 anchor 标记中提到的内部 URL。
我成功获取了标题、图像和 anchor 标记链接。但我无法找到如何发送此 anchor 标记链接来获取摘要。
请帮助我。
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.aitrends.com/category/ai-software/').text
soup = BeautifulSoup(source, 'lxml')
match = soup.find_all('div',class_='td-module-thumb')
for x in match:
headline = x.a.get('title')
print(headline)
imgsrc = x.img.get('src')
print(imgsrc)
artlink = x.a.get('href')
print (artlink)
我正在尝试进入 artlink 并从中提取摘要。
最佳答案
您可以为每个链接运行新请求:
from bs4 import BeautifulSoup as soup
import requests
def get_summary(url):
new_d = soup(requests.get(url).text, 'html.parser')
return '\n'.join(i.text for i in new_d.find('div', {'class':'td-post-content'}).find_all('p'))
d = soup(requests.get('https://www.aitrends.com/category/ai-software/').text, 'html.parser')
results = [{'title':i.h3.text, 'img':i.img['src'], 'summary':get_summary(i.a['href'])} for i in d.find_all('div', {'class':'td-block-span6'})]
输出(仅第一个结果,由于 SO 的字符限制):
{'title': 'AI Still Far Away from Mission-Critical Role, DoD’s Porter Says', 'img': 'https://www.aitrends.com/wp-content/uploads/2019/06/6-14Pentagon-2-324x160.jpg', 'summary': 'Dr. Lisa Porter, Deputy Under Secretary of Defense for Research and Engineering, had a lot of good things to say about the promise of artificial intelligence (AI) technologies at the GEOINT Symposium on June 4, with one important caveat: AI isn’t ready for prime time in Department of Defense (DoD) critical applications, and likely won’t be for some time.\nSpeaking on June 4, Porter spoke about AI and DoD apps, and made it clear that the best way to take advantage of AI is to put significant effort into finding a problem that the technology can really help solve. What is most useful, she said, “is a well-structured problem that is suitable to AI … otherwise AI is just a shiny tool.”\n“Not every problem is ideal for AI,” she said, and advised attendees to “understand the problem better” as a first step. “Take more time to understand what problem you are trying to solve,” she said. “Then see if it’s really possible to generate the right kind of AI data.”\nPorter further urged technologists to spend a lot of time at the beginning of the process with end-users evaluating whether a potential AI project features the right data, reasonable outcomes, and proper metrics to evaluate results.\nAs for mission-critical DoD applications, Porter ticked off a list of problems with the current state of AI development that she said collectively constitute a “very big problem” for using the technology in vital situations. Those include:\nShe also said agencies still seeking to move past legacy IT systems to AI-ready systems are facing a “very hard, heavy lift” in that process.\nTo companies looking to pitch the Federal government on AI applications, she strongly urged them to “explain why your product is effective,” and to fully discuss data sources, algorithms, and how applications produce consistently repeatable results. Companies that can’t show enough evidence on that front might win a pilot project from DoD, but “people who try to take short cuts get caught in pilot purgatory,” and aren’t likely to win more lucrative contracts, she said.\nReporting on AI efforts already underway within DoD, Porter said the agency’s Joint Artificial Intelligence Center (JAIC) “is just starting to get going” after being created a year ago. She said the effort “really has the right focus” on “the impact of AI at scale.”\n“They realize this is very hard,” Porter said, adding, “It’s all about how we do AI at enterprise level.”\n“There’s nothing very smart about today’s AI tools … That’s what we need to improve,” she said. The achievement of “common sense” in human-machine teaming would be “nirvana,” Porter added. “That team could be very powerful … All of these things require some degree of cognition.”\nOn the DARPA front, Porter said about one-third of the organization’s current projects involve AI “to some degree.”\nWhile advanced technology development remains a daunting task, “we will always be ahead if we play to our strengths,” she said. “Those who cheat and steal from us will never win if we play to our strengths,” including adhering to the rule of law, Porter said.\nThere are probably few Federal officials better positioned to judge the capabilities timeline for AI than Porter. In her current position, she oversees research, development and prototyping activities across the DoD enterprise, along with the activities of the Defense Advanced Research Projects Agency (DARPA), the Missile Defense Agency, the Strategic Capabilities Office, the Defense Innovation Unit, and the DoD Laboratory and Engineering Center enterprise.\nBefore her current post, she was executive vice president at In-Q-Tel, and was the first director of the Intelligence Advanced Research Projects Activity (IARPA).\nDr. Porter holds a bachelor’s degree in nuclear engineering from the Massachusetts Institute of Technology and a doctorate in applied physics from Stanford University. She received the Office of the Secretary of Defense Medal for Exceptional Public Service in 2005, the NASA Outstanding Leadership Medal in 2008, the National Intelligence Distinguished Service Medal in 2012, and the Presidential Meritorious Rank Award in 2013.\nSee the source article at MeriTalk.'}
关于python - 如何使用 beautiful soup 动态抓取内部链接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57547057/
我不知道“汤”字面意思在与计算机图形相关的“三角形汤”或“多边形汤”中使用时是什么意思。是不是和我们用勺子吃饭的“汤”有关? (我的母语不是英语。) 最佳答案 维基百科来拯救! A polygon s
我们正在废弃 Amazon.in 网站以检索任何产品的价格。所有产品在“span”标签中的“id”属性都具有不同的值,例如; id = 'priceblock_ourprice', id = 'p
我有一个这样的模板: 和这样的输入 HTML COMPLEX HTML 其中 COMPLEX_HTML 是很多子标签(很干净 - 验证) 我试图将输入 HTML 的 body 标记内的 HTML
我对 soup('tag_name') 和 soup.find_all('tag_name') 之间的区别感到困惑。下面是一个包含一小段 html 的示例: from bs4 import Beaut
我正在尝试使用 css 选择器解析 html 页面 import requests import webbrowser from bs4 import BeautifulSoup page = req
这是网页 HTML 源代码的一部分: apple banana cherry melon 我想提取我想要的网址,比如以/Result 开头的网址?我刚刚了解到您可以在 beautiful soup
我注意到一个非常烦人的错误:BeautifulSoup4(包:bs4)经常发现比以前版本(包:BeautifulSoup)更少的标签。 这是该问题的一个可重现的实例: import requests
所以我一直在试图弄清楚如何抓取一个购买/销售网站的网站,我发现了 HTML 中的所有内容,但该类包含不同的随机数,例如:
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。 这个问题似乎不是关于 a specific programming problem, a softwa
我正在尝试加载 html 页面并输出文本,即使我正确获取网页,BeautifulSoup 以某种方式破坏了编码。 来源: # -*- coding: utf-8 -*- import requests
题目地址:https://leetcode.com/problems/soup-servings/description/ 题目描述: There are two types of soup: t
您好,我正在尝试从网站获取一些信息。请原谅我,如果我的格式有任何错误,这是我第一次发布到 SO。 soup.find('div', {"class":"stars"}) 从这里我收到 我需要 “
我想从 Google Arts & Culture 检索信息使用 BeautifulSoup。我检查了许多 stackoverflow 帖子( [1] , [2] , [3] , [4] , [5]
我决定学习 Python,因为我现在有更多时间(由于大流行)并且一直在自学 Python。 我试图从一个网站上刮取税率,几乎可以获得我需要的一切。下面是来自我的 Soup 变量以及相关 Python
我正在使用 beautifulsoup 从页面中获取所有链接。我的代码是: import requests from bs4 import BeautifulSoup url = 'http://ww
我正在尝试根据部分属性值来识别 html 文档中的标签。 例如,如果我有一个 Beautifulsoup 对象: import bs4 as BeautifulSoup r = requests.ge
Показать телефон 如何在 Beautiful Soup 中找到上述元素? 我尝试了以下方法,但没有奏效: show = soup.find('div', {'class': 'acti
我如何获得结果网址:https://www.sec.gov/Archives/edgar/data/1633917/000163391718000094/0001633917-18-000094-in
我是 python 新手,尝试从页面中提取表格,但无法使用 BS4 找到该表格。你能告诉我我哪里出错了吗? import requests from bs4 import BeautifulSoup
我有一个巨大的 XML 文件(1.2 G),其中包含数百万个 MusicAlbums 的信息,每个 MusicAlbums 都具有如下简单格式 P 22 Exitos
我是一名优秀的程序员,十分优秀!