gpt4 book ai didi

python - Pytest 的 Mock/Monkeypatch BeautifulSoup html 对象

转载 作者:行者123 更新时间:2023-12-05 05:48:47 24 4
gpt4 key购买 nike

我正在使用 Python 开发一个网络抓取项目,并尝试使用 Pytest 添加自动化测试。我不是网络抓取的新手,但我是测试的新手,我相信这里的想法是我应该模拟 HTTP 请求并将其替换为一些虚拟的 html fixture 代码以测试其余功能是否正常工作而无需依赖于从实际 url 请求任何内容。

下面是我的网页抓取功能。

import pandas as pd
from bs4 import BeautifulSoup
from urllib.request import urlopen

def get_player_stats_data():
"""
Web Scrape function w/ BS4 that grabs aggregate season stats
Args:
None
Returns:
Pandas DataFrame of Player Aggregate Season stats
"""
try:
year_stats = 2022
url = f"https://www.basketball-reference.com/leagues/NBA_{year_stats}_per_game.html"
html = urlopen(url)
soup = BeautifulSoup(html, "html.parser")

headers = [th.getText() for th in soup.findAll("tr", limit=2)[0].findAll("th")]
headers = headers[1:]

rows = soup.findAll("tr")[1:]
player_stats = [
[td.getText() for td in rows[i].findAll("td")] for i in range(len(rows))
]

stats = pd.DataFrame(player_stats, columns=headers)

print(
f"General Stats Extraction Function Successful, retrieving {len(stats)} updated rows"
)
return stats
except BaseException as error:
print(f"General Stats Extraction Function Failed, {error}")
df = []
return df

这是我用来获取页面的原始 html 并对其进行 pickle 的方法,以便我可以保存它并导入它以进行测试。

import pickle
from bs4 import BeautifulSoup
from urllib.request import urlopen

year_stats = 2022
url = "https://www.basketball-reference.com/leagues/NBA_2022_per_game.html"
html = urlopen(url)

# how you save it
with open('new_test/tests/fixture_csvs/stats_html.html', 'wb') as fp:
while True:
chunk = html.read(1024)
if not chunk:
break
fp.write(chunk)

# how you open it
with open('new_test/tests/fixture_csvs/stats_html.html', "rb") as fp:
stats_html = fp.read()

我的问题是如何模拟/修补/monkeypatch urlopen(url) 调用并在其位置使用 pickled html 来创建 fixture ? Pytest docs example正在创建一个类 & monkeypatching requests.get() 其中 get 是 requests 的一个属性,这似乎与我正在做的有点不同,而且我无法让我的工作,我认为我应该使用 monkeypatch.setattr 以外的东西?以下是我的尝试。

@pytest.fixture(scope="session")
def player_stats_data_raw(monkeypatch):
"""
Fixture to load web scrape html from an html file for testing.
"""
fname = os.path.join(
os.path.dirname(__file__), "fixture_csvs/stats_html.html"
)

with open(fname, "rb") as fp:
html = fp.read()

def mock_urlopen():
return html

monkeypatch.setattr(urlopen, "url", mock_urlopen)
df = get_player_stats_data()
return df

### The actual tests in a separate file
def test_raw_stats_rows(player_stats_data_raw):
assert len(player_stats_data_raw) == 30

def test_raw_stats_schema(player_stats_data_raw):
assert list(player_stats_data_raw.columns) == raw_stats_cols

目标是用我之前保存的这个 pickled html 替换网络抓取函数中的 html = urlopen(url)

另一种选择是将该 url 转换为函数的输入参数,在生产环境中,我只是调用实际 url,如您在此处看到的那样 (www.basketballreference.com/etc),而在测试中,我只是读取了该 pickled 值。这是一个选项,但我很想学习并将这种修补技术应用到一个真实的例子中。如果有人有任何想法,我将不胜感激!

最佳答案

在你的测试文件中,你可以这样尝试:

from module.script import get_player_stats_data


@pytest.fixture(scope="session")
def urlopen(mocker):
with open(fname, "rb") as fp:
html = fp.read()
urlopen = mocker.patch("module.script.urlopen")
urlopen.return_value = html
return urlopen


def test_raw_stats_rows(urlopen):
df = get_player_stats_data()
assert len(df) == 30


def test_raw_stats_schema(urlopen):
df = get_player_stats_data()
assert list(df.columns) == raw_stats_cols

关于python - Pytest 的 Mock/Monkeypatch BeautifulSoup html 对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70761518/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com