python - 如何使用 pandas 和 pytest 进行 TDD？-6ren

python - 如何使用 pandas 和 pytest 进行 TDD？

转载作者：行者123 更新时间：2023-12-04 16:45:49

36

4

我有一个 Python 脚本，它通过在一系列 DataFrame 操作(drop、groupby、sum 等)中始终使用 Pandas 来整合报告。假设我从一个简单的函数开始，该函数清除所有没有值的列，它有一个 DataFrame 作为输入和输出:

# cei.py
def clean_table_cols(source_df: pd.DataFrame) -> pd.DataFrame:
   # IMPLEMENTATION
   # eg. return source_df.dropna(axis="columns", how="all")

我想在我的测试中验证这个函数实际上删除了所有值都为空的列。所以我安排了一个测试输入和输出，并使用 pandas.testing 中的 assert_frame_equal 函数进行测试:

# test_cei.py
import pandas as pd
def test_clean_table_cols() -> None:
    df = pd.DataFrame(
        {
            "full_valued": [1, 2, 3],
            "all_missing1": [None, None, None],
            "some_missing": [None, 2, 3],
            "all_missing2": [None, None, None],
        }
    )
    expected = pd.DataFrame({"full_valued": [1, 2, 3], "some_missing": [None, 2, 3]})
    result = cei.clean_table_cols(df)
    pd.testing.assert_frame_equal(result, expected)

我的问题是它在概念上是单元测试还是 e2e/集成测试，因为我不是在 mock pandas 实现。但是如果我模拟 DataFrame，我就不会测试代码的功能。按照 TDD 最佳实践进行测试的推荐方法是什么？

注意:在此项目中使用 Pandas 是一项设计决策，因此我们无意抽象 Pandas 接口(interface)以便将来用其他库替换它。

最佳答案

您可能会找到 tdda (测试驱动数据分析)很有用，引用自文档:

The tdda package provides Python support for test-driven data analysis (see 1-page summary with references, or the blog). The tdda.referencetest library is used to support the creation of reference tests, based on either unittest or pytest. The tdda.constraints library is used to discover constraints from a (Pandas) DataFrame, write them out as JSON, and to verify that datasets meet the constraints in the constraints file. It also supports tables in a variety of relation databases. There is also a command-line utility for discovering and verifying constraints, and detecting failing records. The tdda.rexpy library is a tool for automatically inferring regular expressions from a column in a Pandas DataFrame or from a (Python) list of examples. There is also a command-line utility for Rexpy. Although the library is provided as a Python package, and can be called through its Python API, it also provides command-line tools."

另见 Nick Radcliffe's PyData talk on Test-Driven Data Analysis

关于python - 如何使用 pandas 和 pytest 进行 TDD？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61291416/

36

4

0

文章推荐： python - 可以重新加载导入的字典吗？

文章推荐： Pytest测试是方法抽象

文章推荐： python - conftest.py 可以_read_ pytest.ini 中的现有值吗？

pytest - pytest 中所有测试用例的超时时间
我需要在整体超时的情况下停止测试用例，而不是在测试用例级别。所以如果让我说我有 300 个测试用例，我想超时，总时间为 300 秒。有没有办法做到这一点？用于运行 pytest 的示例命令 py
pytest - 为 tox 设置 pytest arg 但不是直接 pytest
我会默认使用一些参数( -n 2 )运行 pytest 但如果我只输入 pytest ... ，我不希望默认使用该参数直接运行pytest。这可能吗？如果我包括这个: [pytest] addopt
pytest - 生成 pytest.File 或 pytest.Item 实例时，我如何指示这些需要固定装置？
给定以下模型: import pytest class DummyFile(pytest.File): def collect(self): yield DummyItem(s
pytest - 使用 pytest 运行相关测试
对于 pytest，我正在使用库 pytest-dependency 设置依赖项.我还为这些测试添加了标记。这是一个 ECM: # test_test.py import pytest @pytest
pytest - 使用 pytest 动态控制测试顺序
我想使用逻辑来控制我的测试的顺序，这将在它们已经运行时动态重新排序它们。我的用例是这样的:我正在使用 xdist 并行化我的测试，并且每个测试都使用来自公共(public)和有限池的外部资源。一些测
pytest - 如何有条件地跳过参数化的 pytest 场景？
我需要标记要跳过的某些测试。但是，有些测试是参数化的，我只需要能够跳过某些场景。我使用 py.test -m "hermes_only" 调用测试或 py.test -m "not hermes_o
pytest - 有没有办法跳过 pytest fixture ？
问题是我给定的 fixture 函数具有外部依赖性，这会导致“错误”(例如无法访问的网络/资源不足等)。我想跳过 fixture ，然后跳过任何依赖于该 fixture 的测试。做这样的事情是行不
pytest - 仅抑制其他人代码的 pytest 警告
我正在试用 pytest首次。我如何抑制发出的关于我的代码所依赖的其他人的代码的警告而不抑制关于我自己的代码的警告？现在我的 pytest.ini 中有这个所以我不必看到 pytest 警告我关于
pytest - 在测试执行之前无法获取 pytest 命令行参数？
我试图跳过依赖于命令行参数值的特定测试。我尝试使用 pytest.config.getoption("--some-custom-argument") 获取参数值就像这里描述的一样 related q
pytest - 当 pytest 测试失败时是否可以抑制所有参数化参数的显示
我目前使用的是 python 3.5.1 和 3.6 以及最新版本的 pytest。当使用参数化测试运行 pytest 时，我希望任何失败的测试仅显示失败的测试，而不是参数化测试的所有设置。解释一下
pytest - 为不同的测试赋予 Pytest 设备不同的范围
在我的测试套件中，我有一些数据生成装置，用于许多参数化测试。其中一些测试希望这些装置在每个 session 中只运行一次，而另一些则需要它们运行每个功能。例如，我可能有一个类似于: @pytest.f
pytest - 如何在 pytest 运行时获取测试名称和测试结果
我想在运行时获取测试名称和测试结果。我有 setup和 tearDown我的脚本中的方法。在 setup ，我需要获取测试名称，并在 tearDown我需要得到测试结果和测试执行时间。有没有办法我
pytest - 标记 Pytest 固定装置而不是使用固定装置的所有测试
有没有办法在 PyTest fixture 中定义标记？当我指定 -m "not slow" 时，我试图禁用慢速测试在 pytest 中。我已经能够禁用单个测试，但不能禁用用于多个测试的 fixt
pytest - 在测试完成时运行的基本 pytest 拆解
我最低限度地使用 pytest 作为针对工作中各种 API 产品的大型自动化集成测试的通用测试运行器，并且我一直在尝试寻找一个同样通用的拆卸函数示例，该函数在任何测试完成时运行，无论成功或失败。我的
pytest - 如何强制 pytest 编写颜色输出？
即使在写入管道时，如何强制 pytest 以颜色显示结果？似乎没有任何命令行选项可以这样做。最佳答案从 2.5.0 开始，py.test 有选项 --color=yes 从 2.7.0 开始，还应
pytest - 在多个对象上运行一套 pytest 测试
作为一组更大的测试的一小部分，我有一套测试函数，我想在每个对象列表上运行。基本上，我有一组插件和一组“插件测试”。天真地，我可以只列出一个带有插件参数的测试函数列表和一个插件列表，然后进行测试，我在
pytest - 有没有办法找出正在运行的 pytest-xdist 网关？
我想为 pytest-xdist 产生的每个子进程/网关创建一个单独的日志文件。是否有一种优雅的方法可以找出 pytest 当前所在的子进程/网关？我正在使用位于 conftest.py 的 sess
pytest - 如果测试文件具有设置和拆卸部分，如何使用 pytest 并行执行测试
我的测试脚本如下 @pytest.fixture(scope="Module", Autouse="True") def setup_test(): ....................
python - pytest:使用固定装置对基于类的测试进行参数化(pytest-django)
我正在尝试像这样参数化我的类测试: @pytest.mark.parametrize('current_user', ["test_profile_premium", "test_profile_fr
pytest - 如何运行 pytest-bdd 测试？
我不明白如何正确运行一个简单的测试(功能文件和 python 文件) 与图书馆 pytest-bdd . 来自官方documentation ，我无法理解要发出什么命令来运行测试。我尝试使用 pyt

首页

博学

6Ren·AI

商城

python - 如何使用 pandas 和 pytest 进行 TDD？