gpt4 book ai didi

python - 如何让我的代码停止在网络爬虫中打印关键字

转载 作者:太空宇宙 更新时间:2023-11-03 18:18:49 25 4
gpt4 key购买 nike

Python 新手,只是使用 bs4 和 requests 模块来玩弄网络爬虫。目前,代码不断在我的关键字实例中打印,并且想知道如何让它只打印一次。 我是否使用“中断”以及将其插入代码中的何处?

import requests
from bs4 import BeautifulSoup

# Test for agency offering scrape
def seo(url):
result = requests.get(url)
soup = BeautifulSoup(result.text)
lowercased = result.text.lower()
keywords = ['creative']
for keyword in keywords:
if keyword.lower() in lowercased:
print (keyword)

links = soup.find_all('a')[1:]
for link in links:
seo(link['href'])
seo("http://www.daileyideas.com/")

最佳答案

如果您想在找到关键字时退出函数,只需return:

def seo(url):
result = requests.get(url)
soup = BeautifulSoup(result.text)
lowercased = result.text.lower()
found=False
keywords = ['creative']
print keywords[0] in lowercased
for keyword in keywords:
if keyword.lower() in lowercased:
found =True
links = soup.find_all('a')[1:]
for link in links:
if not found:
seo(link['href'])
else:
print(keyword)
return

此函数将获取第一页上的所有链接并访问每个链接,直到找到关键字或我们用完链接:

import urlparse
def seo(url):
result = requests.get(url)
soup = BeautifulSoup(result.text)
links = [urlparse.urljoin(url, tag['href']) for tag in soup.findAll('a', href=True)] # get all links on the page
lower_cased = result.text.lower()
keywords = ['creative']
while links: # keep going until list is empty
for keyword in keywords:
if keyword.lower() in lower_cased:
print "Success we found the keyword: {}".format(keyword)
return
link = links.pop() # get next link to check
result = requests.get(link)
lower_cased = result.text.lower()

在递归搜索中,您需要设置一些深度限制,否则如果找不到关键字,您的搜索将继续进行。 Scrapy有工具可以做你想做的事,所以如果你真的想做的话,值得一试。

关于python - 如何让我的代码停止在网络爬虫中打印关键字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24560112/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com