gpt4 book ai didi

python-3.x - python中的静默错误处理?

转载 作者:行者123 更新时间:2023-12-03 07:46:35 24 4
gpt4 key购买 nike

我有大量网址的csv文件。为了方便起见,我将其读入pandas数据框。稍后我需要做一些统计工作- Pandas 很方便。看起来有点像这样:

import pandas as pd
csv = [{"URLs" : "www.mercedes-benz.de", "electric" : 1}, {"URLs" : "www.audi.de", "electric" : 0}, {"URLs" : "ww.audo.e", "electric" : 0}, {"URLs" : "NaN", "electric" : 0}]
df = pd.DataFrame(csv)

我的任务是检查网站是否包含某些字符串,并添加一个额外的列(如果包含),否则添加0。例如:我要检查,是否 www.mercedes-benz.de是否包含字符串 car。我执行以下操作:
for i, row in df.iterrows():
page_content = requests.get(row['URLs'])
if "car" in page_content.text:
df.loc[i, 'car'] = '1'
else:
df.loc[i, 'car'] = '0'

问题是:有时URL错误/丢失。我的小脚本导致错误。

如果网址错误/丢失,如何处理/抑制错误?而且,我该如何在这种情况下,请使用 df.loc[i, 'url_wrong'] = '1'表示网址错误/丢失?

最佳答案

尝试定义一个函数,该函数首先进行“汽车”检查,并使用 Pandas .applySeries方法获取10Wrong URL。以下内容应有所帮助:

import pandas as pd
import requests


data = [{"URLs" : "https://www.mercedes-benz.de", "electric" : 1},
{"URLs" : "https://www.audi.de", "electric" : 0},
{"URLs" : "https://ww.audo.e", "electric" : 0},
{"URLs" : "NaN", "electric" : 0}]


def contains_car(link):
try:
return int('car' in requests.get(link).text)
except:
return "Wrong/Missing URL"


df = pd.DataFrame(data)

df['extra_column'] = df.URLs.apply(contains_car)


# URLs electric extra_column
# 0 https://www.mercedes-benz.de 1 1
# 1 https://www.audi.de 0 1
# 2 https://ww.audo.e 0 Wrong/Missing URL
# 3 NaN 0 Wrong/Missing URL

编辑:

您可以在HTTP请求的返回文本中搜索多个关键字。根据您设置的条件,可以使用内置函数 any或内置函数 all来完成此操作。使用 any意味着找到任何关键字应返回1,而使用 all则意味着必须匹配所有关键字才能返回1。在下面的示例中,我将 any与诸如“car”,“automobile”之类的关键字一起使用','车辆':
import pandas as pd
import requests


data = [{"URLs" : "https://www.mercedes-benz.de", "electric" : 1},
{"URLs" : "https://www.audi.de", "electric" : 0},
{"URLs" : "https://ww.audo.e", "electric" : 0},
{"URLs" : "NaN", "electric" : 0}]


def contains_keywords(link, keywords):
try:
output = requests.get(link).text
return int(any(x in output for x in keywords))
except:
return "Wrong/Missing URL"


df = pd.DataFrame(data)
mykeywords = ('car', 'vehicle', 'automobile')
df['extra_column'] = df.URLs.apply(lambda l: contains_keywords(l, mykeywords))

应该产生:
#                            URLs  electric       extra_column
# 0 https://www.mercedes-benz.de 1 1
# 1 https://www.audi.de 0 1
# 2 https://ww.audo.e 0 Wrong/Missing URL
# 3 NaN 0 Wrong/Missing URL

我希望这有帮助。

关于python-3.x - python中的静默错误处理?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44590079/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com