gpt4 book ai didi

python - 如何使用 pandas 和 pytest 进行 TDD?

转载 作者:行者123 更新时间:2023-12-04 16:45:49 36 4
gpt4 key购买 nike

我有一个 Python 脚本,它通过在一系列 DataFrame 操作(drop、groupby、sum 等)中始终使用 Pandas 来整合报告。假设我从一个简单的函数开始,该函数清除所有没有值的列,它有一个 DataFrame 作为输入和输出:

# cei.py
def clean_table_cols(source_df: pd.DataFrame) -> pd.DataFrame:
# IMPLEMENTATION
# eg. return source_df.dropna(axis="columns", how="all")

我想在我的测试中验证这个函数实际上删除了所有值都为空的列。所以我安排了一个测试输入和输出,并使用 pandas.testing 中的 assert_frame_equal 函数进行测试:

# test_cei.py
import pandas as pd
def test_clean_table_cols() -> None:
df = pd.DataFrame(
{
"full_valued": [1, 2, 3],
"all_missing1": [None, None, None],
"some_missing": [None, 2, 3],
"all_missing2": [None, None, None],
}
)
expected = pd.DataFrame({"full_valued": [1, 2, 3], "some_missing": [None, 2, 3]})
result = cei.clean_table_cols(df)
pd.testing.assert_frame_equal(result, expected)

我的问题是它在概念上是单元测试还是 e2e/集成测试,因为我不是在 mock pandas 实现。但是如果我模拟 DataFrame,我就不会测试代码的功能。按照 TDD 最佳实践进行测试的推荐方法是什么?

注意:在此项目中使用 Pandas 是一项设计决策,因此我们无意抽象 Pandas 接口(interface)以便将来用其他库替换它。

最佳答案

您可能会找到 tdda (测试驱动数据分析)很有用,引用自文档:

The tdda package provides Python support for test-driven data analysis (see 1-page summary with references, or the blog). The tdda.referencetest library is used to support the creation of reference tests, based on either unittest or pytest. The tdda.constraints library is used to discover constraints from a (Pandas) DataFrame, write them out as JSON, and to verify that datasets meet the constraints in the constraints file. It also supports tables in a variety of relation databases. There is also a command-line utility for discovering and verifying constraints, and detecting failing records. The tdda.rexpy library is a tool for automatically inferring regular expressions from a column in a Pandas DataFrame or from a (Python) list of examples. There is also a command-line utility for Rexpy. Although the library is provided as a Python package, and can be called through its Python API, it also provides command-line tools."

另见 Nick Radcliffe's PyData talk on Test-Driven Data Analysis

关于python - 如何使用 pandas 和 pytest 进行 TDD?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61291416/

36 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com