gpt4 book ai didi

python - 如果 `pandas.testing.assert_frame_equal` 失败,如何输出所有差异?

转载 作者:行者123 更新时间:2023-12-05 05:45:19 27 4
gpt4 key购买 nike

我正在对 Dataframe 输出进行单元测试。我有两个在多列上具有不同值的数据框

df1 = pd.DataFrame({"col1": [1, 1], "col2":[1, 1]})
df2 = pd.DataFrame({"col1": [1, 2], "col2":[1, 2]})

当我运行 pandas.testing.assert_frame_equal 时,出现以下错误,只有一列:

DataFrame.iloc[:, 0] (column name="col1") values are different (50.0 %)
[index]: [0, 1]
[left]: [1, 1]
[right]: [1, 2]

但是,我没有关于第二列的信息。有没有办法显示所有不匹配项,而不仅仅是最左侧列中的第一个?

最佳答案

执行此操作的另一种(hacky,但性能稍好)方法:

def assert_frame_equal_extended_diff(df1, df2):
try:
pd.testing.assert_frame_equal(df1, df2)

except AssertionError as e:
# if this was a shape or index/col error, then re-raise
try:
pd.testing.assert_index_equal(df1.index, df2.index)
pd.testing.assert_index_equal(df1.columns, df2.columns)
except AssertionError:
raise e

# if not, we have a value error
diff = df1 != df2
diffcols = diff.any(axis=0)
diffrows = diff.any(axis=1)
cmp = pd.concat(
{'left': df1.loc[diffrows, diffcols], 'right': df2.loc[diffrows, diffcols]},
names=['dataframe'],
axis=1,
)

raise AssertionError(e.args[0] + f'\n\nDifferences:\n{cmp}') from None

这将使用 pandas.DataFrame 的 repr 来显示差异:

In [5]: df1 = pd.DataFrame({
...: 'samecol': np.arange(1500),
...: 'diffcol': np.arange(1500),
...: 'anothercol': np.ones(shape=1500),
...: })

In [6]: df2 = df1.copy()
...: df2.iloc[1000:1014, 1] = range(14)

In [7]: assert_frame_equal_extended_diff(df1, df2)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 assert_frame_equal_extended_diff(df1, df2)

Input In [6], in assert_frame_equal_extended_diff(df1, df2)
11 diffrows = diff.any(axis=1)
12 cmp = pd.concat(
13 {'left': df1.loc[diffrows, diffcols], 'right': df2.loc[diffrows, diffcols]},
14 names=['dataframe'],
15 axis=1,
16 )
---> 18 raise AssertionError(e.args[0] + f'\n\nDifferences:\n{cmp}') from None

AssertionError: DataFrame.iloc[:, 1] (column name="diffcol") are different

DataFrame.iloc[:, 1] (column name="diffcol") values are different (0.93333 %)
[index]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]
[left]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]
[right]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]

Differences:
dataframe left right
diffcol diffcol
1000 1000 0
1001 1001 1
1002 1002 2
1003 1003 3
1004 1004 4
1005 1005 5
1006 1006 6
1007 1007 7
1008 1008 8
1009 1009 9
1010 1010 10
1011 1011 11
1012 1012 12
1013 1013 13

注意 - 此答案旨在帮助调试,但不是全面/无边缘情况的方法。欢迎编辑,但使用风险自负。

关于python - 如果 `pandas.testing.assert_frame_equal` 失败,如何输出所有差异?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71412691/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com