gpt4 book ai didi

python - 如何比较两个 CSV 文件并找出差异?

转载 作者:太空宇宙 更新时间:2023-11-04 02:32:58 27 4
gpt4 key购买 nike

我有两个 CSV 文件,

a1.csv

city,state,link
Aguila,Arizona,https://www.glendaleaz.com/planning/documents/AppendixAZONING.pdf
AkChin,Arizona,http://www.maricopa-az.gov/zoningcode/wp-content/uploads/2014/05/Zoning-Code-Rewrite-Public-Review-Draft-3-Tracked-Edits-lowres1.pdf
Aguila,Arizona,http://www.co.apache.az.us/planning-and-zoning-division/zoning-ordinances/

a2.csv

city,state,link
Aguila,Arizona,http://www.co.apache.az.us

我想得到差异。

这是我的尝试:

import pandas as pd

a = pd.read_csv('a1.csv')
b = pd.read_csv('a2.csv')

mask = a.isin(b.to_dict(orient='list'))
# Reverse the mask and remove null rows.
# Upside is that index of original rows that
# are now gone are preserved (see result).
c = a[~mask].dropna()
print c

预期输出:

city,state,link
Aguila,Arizona,https://www.glendaleaz.com/planning/documents/AppendixAZONING.pdf
AkChin,Arizona,http://www.maricopa-az.gov/zoningcode/wp-content/uploads/2014/05/Zoning-Code-Rewrite-Public-Review-Draft-3-Tracked-Edits-lowres1.pdf

但是我得到一个错误:

Empty DataFrame
Columns: [city, state, link]
Index: []**

我想根据前两行进行检查,如果它们相同,则将其删除。

最佳答案

您可以使用 pandas 读入两个文件,加入它们并删除所有重复行:

import pandas as pd
a = pd.read_csv('a1.csv')
b = pd.read_csv('a2.csv')
ab = pd.concat([a,b], axis=0)
ab.drop_duplicates(keep=False)

引用:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html

关于python - 如何比较两个 CSV 文件并找出差异?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48693547/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com