gpt4 book ai didi

Python Pandas 如果数字相等则合并

转载 作者:太空宇宙 更新时间:2023-11-03 16:34:55 25 4
gpt4 key购买 nike

我正在尝试根据条件合并两个 csv。 csv2 上的值“KEYS”必须与 CSV1 上的“TCNUM”匹配,并将其附加到第三列。 csv 非常大,必须通过代码完成。

df1 - CSV1:

ID                                       TC_NUM
dialog_testcase_0101.0001_greeting.xml 101.0001
dialog_testcase_0101.0002_greeting.xml 101.0002
dialog_testcase_0101.0003_greeting.xml 101.0003
dialog_testcase_0101.0004_greeting.xml 101.0004
dialog_testcase_0101.0005_greeting.xml 101.0005
dialog_testcase_0101.0006_greeting.xml 101.0006
dialog_testcase_0901.0008_greeting.xml 901.0007
dialog_testcase_0101.0008_greeting.xml 101.0008
dialog_testcase_0501.001_greeting.xml 501.001
dialog_testcase_0801.0011_greeting.xml 801.0011

df2 - CSV2:

KEYS             TC_NUM
FIT-3982 TC 101.0011, 101.0004
FIT-3980 TC 801.0011.901.007
FIT-3979 TC 101.0006, 501.001, 1907.0019, 1907.0020, 1907.0021

我想要什么:

csv最终:

ID                                       TC_NUM        Keys
dialog_testcase_0101.0001_greeting.xml 101.0011 FIT-3982
dialog_testcase_0101.0002_greeting.xml 101.0002
dialog_testcase_0101.0003_greeting.xml 101.0006 FIT_3979
dialog_testcase_0101.0004_greeting.xml 101.0004 FIT-3982
dialog_testcase_0101.0005_greeting.xml 101.0005
dialog_testcase_0101.0006_greeting.xml 101.0011 FIT_3982
dialog_testcase_0901.0008_greeting.xml 901.0007 FIT_3979
dialog_testcase_0101.0008_greeting.xml 101.0008
dialog_testcase_0501.001_greeting.xml 501.001 FIT-3979
dialog_testcase_0801.0011_greeting.xml 801.0011 FIT-3980

我的代码..

mergedOpen = pd.merge(df1, df2, on=['TC_NUM'])
mergedOpen.set_index('TC_NUM', inplace=True)

mergedOpen.to_csv('MergedCSVOPEN.csv')

最佳答案

您可以在set_index之后从列 TC_NUM 中删除第一个 3 字符,split通过 ,unstackreset_indexmerge创建新的DataFrame 。两列 TC_NUM 都必须设置为等于 dtype - stringnumeric。我选择数字,因此我转换列df2.TC_NUM to_numeric :

df2.set_index('KEYS',inplace=True)

df2 = df2.TC_NUM.str[3:]
.str.split(', ', expand=True)
.unstack()
.reset_index(drop=True, level=0)
.reset_index(name='TC_NUM')

df2['TC_NUM'] = pd.to_numeric(df2['TC_NUM'])
print (df2)
KEYS TC_NUM
0 FIT-3982 101.0011
1 FIT-3980 801.0011
2 FIT-3979 101.0006
3 FIT-3982 101.0004
4 FIT-3980 901.0070
5 FIT-3979 501.0010
6 FIT-3982 NaN
7 FIT-3980 NaN
8 FIT-3979 1907.0019
9 FIT-3982 NaN
10 FIT-3980 NaN
11 FIT-3979 1907.0020
12 FIT-3982 NaN
13 FIT-3980 NaN
14 FIT-3979 1907.0021
mergedOpen = pd.merge(df1, df2, on='TC_NUM', how='left')
print (mergedOpen)
ID TC_NUM KEYS
0 dialog_testcase_0101.0001_greeting.xml 101.0001 NaN
1 dialog_testcase_0101.0002_greeting.xml 101.0002 NaN
2 dialog_testcase_0101.0003_greeting.xml 101.0003 NaN
3 dialog_testcase_0101.0004_greeting.xml 101.0004 FIT-3982
4 dialog_testcase_0101.0005_greeting.xml 101.0005 NaN
5 dialog_testcase_0101.0006_greeting.xml 101.0006 FIT-3979
6 dialog_testcase_0901.0008_greeting.xml 901.0007 NaN
7 dialog_testcase_0101.0008_greeting.xml 101.0008 NaN
8 dialog_testcase_0501.001_greeting.xml 501.0010 FIT-3979
9 dialog_testcase_0801.0011_greeting.xml 801.0011 FIT-3980

mergedOpen.set_index('TC_NUM', inplace=True)
mergedOpen.to_csv('MergedCSVOPEN.csv')

关于Python Pandas 如果数字相等则合并,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37310141/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com