gpt4 book ai didi

python - Pandas 数据框唯一值

转载 作者:太空宇宙 更新时间:2023-11-04 00:38:44 24 4
gpt4 key购买 nike

需要一些帮助来从 pandas dataframe 中获取唯一值

我有:

    >>> df1
source target metric
0 acc1.yyy acx1.xxx 10000
1 acx1.xxx acc1.yyy 10000

目标是删除基于源+目标或目标+源的唯一值。但我不能用 drop_duplicates 得到这个。

>>> df2 = df1.drop_duplicates(subset=['source','target'])
>>> df2
source target metric
0 acc1.yyy acx1.xxx 10000
1 acx1.xxx acc1.yyy 10000

[更新]

也许重复在这里不是正确的词所以让我进一步解释

id  source  target
0 bng1.xxx.00 bdr2.xxx.00
1 bng1.xxx.00 bdr1.xxx.00
2 bdr3.yyy.00 bdr3.xxx.00
3 bdr3.xxx.00 bdr3.yyy.00
4 bdr2.xxx.00 bng1.xxx.00
5 bdr1.xxx.00 bng1.xxx.00

在上面,我想删除具有例如 source=target 和 target=source 的条目。

0 and 4 = same pair
1 and 5 = same pair
2 and 3 = same pair

end goal will be to keep 0 1 2 or 4 5 3 .

最佳答案

您需要先对两列进行排序:

df1[['source','target']] = df1[['source','target']].apply(sorted,axis=1)
print (df1)
source target metric
0 acc1.yyy acx1.xxx 10000
1 acc1.yyy acx1.xxx 10000

df2 = df1.drop_duplicates(subset=['source','target'])
print (df2)
source target metric
0 acc1.yyy acx1.xxx 10000

编辑:

source 列似乎需要更改 - 删除最后 3 个字符:

df1['source1'] = df1.source.str[:-3]
df1[['source1','target']] = df1[['source1','target']].apply(sorted,axis=1)
print (df1)
id source target source1
0 0 bng1.xxx.00-00 bng1.xxx.00 bdr2.xxx.00
1 1 bng1.xxx.00-00 bng1.xxx.00 bdr1.xxx.00
2 2 bdr3.yyy.00-00 bdr3.yyy.00 bdr3.xxx.00
3 3 bdr3.xxx.00-00 bdr3.yyy.00 bdr3.xxx.00
4 4 bdr2.xxx.00-00 bng1.xxx.00 bdr2.xxx.00
5 5 bdr1.xxx.00-00 bng1.xxx.00 bdr1.xxx.00

df2 = df1.drop_duplicates(subset=['source1','target'])
df2 = df2.drop('source1', axis=1)
print (df2)
id source target
0 0 bng1.xxx.00-00 bng1.xxx.00
1 1 bng1.xxx.00-00 bng1.xxx.00
2 2 bdr3.yyy.00-00 bdr3.yyy.00

关于python - Pandas 数据框唯一值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42834853/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com