gpt4 book ai didi

python - 使用 Pandas 比较具有不同行值和坐标的两个 Excel 电子表格

转载 作者:行者123 更新时间:2023-12-04 19:50:47 24 4
gpt4 key购买 nike

我正在制作一个与 pandas 进行比较的 excel 程序。我制作了一个运行良好的简单比较工具,但它会逐行比较并显示列其他部分中出现的更改。发生这种情况是因为两个工作表中的行坐标不相等。为了澄清一点,这是我的代码:

import pandas as pd
import numpy as np
import openpyxl

wb = openpyxl.load_workbook('CK_CBF_Draft_01.2018_original.xlsx')
ws = wb['CBF']

list1 = []

for i in ws['H1':'H365']:
for cell in i:
list1.append(i)

# Define the diff function to show the changes in each field
def report_diff(x):
return x[1] if x[1] in list1 else '{} ---> {}'.format(x[0],x[1])

# We want to be able to easily tell which rows have changes
def has_change(row):
if "--->" in row.to_string():
return "Y"
else:
return "N"

# Read in both excel files
df1 = pd.read_excel('Invoice1.xlsx', 'Sheet1', na_values=['NA'])
df2 = pd.read_excel('Invoice2.xlsx', 'Sheet1', na_values=['NA'])

# Make sure we order by account number so the comparisons work
df1.sort_values(by="Host Name")
df1=df1.reindex()
df2.sort_values(by="Host Name")
df2=df2.reindex()

# Create a panel of the two dataframes
diff_panel = pd.Panel(dict(df1=df1,df2=df2))

#Apply the diff function
diff_output = diff_panel.apply(report_diff, axis=0)

# Flag all the changes
diff_output['has_change'] = diff_output.apply(has_change, axis=1)

#Save the changes to excel but only include the columns we care about
diff_output[(diff_output.has_change == 'Y')].to_excel('my-diff-1.xlsx',index=False,columns=["Host Name","CPU#","Memory","Invoice Total","Quantity"])

print('Worked')

正如我所说,问题在于这会返回逐行差异,并且差异是不正确的,因为它们出现在列的不同部分。有谁知道准确比较具有不同行的两个文件的方法?

感谢您的帮助,如果问题有点含糊,我们深表歉意。

最佳答案

import pandas as pd
import numpy as np

# Read both Excel files
file1 = pd.read_excel("file1.xlsx", na_values=['NA'])
file2 = pd.read_excel("file2.xlsx", na_values=['NA'])


df2 = file1
df1 = file2


res = df1[df1['samecolname'].isin(df2['samecolname'].unique())]

res2 = df2[df2['samecolname'].isin(df1['samecolname'].unique())]

res.to_excel('diff1-insecond-but-not-in-first.xlsx',index=False)
res2.to_excel('diff2-in-first-not-in-second.xlsx',index=False)

关于python - 使用 Pandas 比较具有不同行值和坐标的两个 Excel 电子表格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52177579/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com