gpt4 book ai didi

python - Pandas,具有附加列字符串的唯一条件

转载 作者:太空宇宙 更新时间:2023-11-03 15:35:45 25 4
gpt4 key购买 nike

考虑这样的数据框:

coordinates                     metric year
[55.2274742137, 25.1560686018] met_1 2014
[55.1554330879, 25.0986809174] met_2 2015
[55.1554330879, 25.0986809174] met_2 2016
[55.14353879, 25.44] met_221212 2020
[55.11239959, 25.3232] met_2132 2022

期望的结果:

coordinates                     metric year
[55.2274742137, 25.1560686018] met_1 2014
[55.1554330879, 25.0986809174] met_2 [2015,2016]
[55.14353879, 25.44] met_221212 2020
[55.11239959, 25.3232] met_2132 2022

我希望找到在坐标指标列上重复的记录。当他们这样做时,将year指标附加到列表中并将其作为新的year列传递。然后我想删除重复项

最佳答案

您需要groupbyapply :

但是如果列带有列表:

TypeError: unhashable type: 'list'

Solution转换为可散列元组

另一个问题是,如果仅当多个值为1时才需要列表,因此需要有点复杂的列表理解:

df.coordinates = df.coordinates.apply(tuple)
df = df.groupby(['coordinates','metric'], sort=False)['year']
.apply(lambda x: list(x) if len(x) > 1 else x.item())
df = df.reset_index()
df.coordinates = df.coordinates.apply(list)
print (df)
coordinates metric year
0 [55.2274742137, 25.1560686018] met_1 2014
1 [55.1554330879, 25.0986809174] met_2 [2015, 2016]
2 [55.14353879, 25.44] met_221212 2020
3 [55.11239959, 25.3232] met_2132 2022

如果可能,请在输出列中使用列表来获取所有值:

df.coordinates = df.coordinates.apply(tuple)
df = df.groupby(['coordinates','metric'], sort=False)['year'].apply(list)
df = df.reset_index()
df.coordinates = df.coordinates.apply(list)
print (df)
coordinates metric year
0 [55.2274742137, 25.1560686018] met_1 [2014]
1 [55.1554330879, 25.0986809174] met_2 [2015, 2016]
2 [55.14353879, 25.44] met_221212 [2020]
3 [55.11239959, 25.3232] met_2132 [2022]

如果需要输出为字符串:

df.coordinates = df.coordinates.apply(tuple)
df = df.groupby(['coordinates','metric'], sort=False)['year']
.apply(lambda x: ','.join(x.astype(str)))
df = df.reset_index()
df.coordinates = df.coordinates.apply(list)
print (df)
coordinates metric year
0 [55.2274742137, 25.1560686018] met_1 2014
1 [55.1554330879, 25.0986809174] met_2 2015,2016
2 [55.14353879, 25.44] met_221212 2020
3 [55.11239959, 25.3232] met_2132 2022

关于python - Pandas,具有附加列字符串的唯一条件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42533089/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com