gpt4 book ai didi

python - 如何在比较同一数据帧的两个版本后获取修改的行

转载 作者:行者123 更新时间:2023-12-01 07:32:26 25 4
gpt4 key购买 nike

我正在创建一个脚本,用于今天读取 CSV 文件并将其与昨天的相同数据文件进行比较。

此 CSV 每天上传到服务器一次,我想比较今天和昨天的文件。

我想通过比较这两个文件来了解修改、插入或删除的行。

我已经完成了插入和删除,但我在修改方面遇到了困难。

下面是获取 INSERT 和 DELETE Dataframes 的代码:

def getInsDel(df_old,df_new,key):
#concatinating old and new data to generate comparisons
df = pd.concat([df_new,df_old])
df= df.reset_index(drop = True)


#doing a group by for getting the frequency of each key
print('Grouping data for frequency of key...')
df_gpby = df.groupby(list(df.columns))
idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]
df_delta = df.reindex(idx)
df_delta_freq = df_delta.groupby(key).size().reset_index(name='Freq')

#Filtering data for frequency = 1, since these will be the target records for DELETE and INSERT
print('Creating data frame to get records with Frequency = 1 ...')
filter = df_delta_freq['Freq']==1
df_delta_freq_ins_del = df_delta_freq.where(filter)


#Dropping row with NULL
df_delta_freq_ins_del = df_delta_freq_ins_del.dropna()


print('Creating data frames of Insert and Deletes ...')
#Creating INSERT dataFrame
df_ins = pd.merge(df_new,
df_delta_freq_ins_del[key],
on = key,
how = 'inner'
)

#Creating DELETE dataFrame
df_del = pd.merge(df_old,
df_delta_freq_ins_del[key],
on = key,
how = 'inner'
)

print('size of INSERT file: ' + str(df_ins.shape))
print('size of DELETE file: ' + str(df_del.shape))


return df_ins,df_del

例如,旧数据是:

ID  Name  X  Y
1 ABC 1 2
2 DEF 2 3
3 HIJ 3 4

新数据集是:

ID  Name   X   Y
2 DEF 2 3
3 HIJ 55 42
4 KLM 4 5

其中 ID 是 key 。

Modified_DataFrame 应该是:

ID   Name   X   Y
3 HIJ 55 42

注意:请参阅此处,ID - 1,2 将位于 DELETE 数据帧中,ID - 4 将位于 INSERT DataFrame 中。这部分我是通过根据键进行分组,然后根据这些键的频率进行过滤来完成的。如果频率为 1,那么我知道它是 DELETE 还是 INSERT。

但是,如何才能获取 Modified_dataFrame?

编辑

引用评论中的链接后,经过一些修改,我添加了如下所示的MODIFY_DataFrame:

    df_all = pd.concat([df_new,df_old],ignore_index=True)
cols_list = list(df_all)
modifcations = df_all.drop_duplicates(subset=cols_list, keep='last')
mod_keys = modifcations[modifcations[key].duplicated() == True][key]

df_mod = pd.merge(df_new,
mod_keys[key],
on = key,
how = 'inner'
)

print('size of MODIFY file: ' + str(df_mod.shape))

谢谢大家!

最佳答案

引用评论中的链接后,经过一些修改,我添加了如下所示的MODIFY_DataFrame:

    df_all = pd.concat([df_new,df_old],ignore_index=True)
cols_list = list(df_all)
modifcations = df_all.drop_duplicates(subset=cols_list, keep='last')
mod_keys = modifcations[modifcations[key].duplicated() == True][key]

df_mod = pd.merge(df_new,
mod_keys[key],
on = key,
how = 'inner'
)

print('size of MODIFY file: ' + str(df_mod.shape))

谢谢大家!

关于python - 如何在比较同一数据帧的两个版本后获取修改的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57153549/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com