gpt4 book ai didi

python - 删除没有最长列表的数据帧行

转载 作者:行者123 更新时间:2023-11-30 22:18:14 25 4
gpt4 key购买 nike

我的搜索技能一定让我失望,因为这一定是一个常见问题。我有一个带有嵌套列表的数据框,并且正在尝试删除所有没有最长列表的数据框:

df = pd.DataFrame(data = [["a", "b", "c", ["d", "e"]],
["a", "b", "c", ["e"]],
["l", "m", "n", ["o"]],
columns = ["c1", "c2", "c3", "c4"])

# max doesn't evaluate length ~ this is wrong
df.groupby(by=["c1", "c2", "c3"])["c4"].apply(max)
c1 c2 c3
a b c [e]
l m n [o]
Name: c4, dtype: object

# but length does ~ but using an int to equate to another row isn't guaranteed
df.groupby(by=["c1", "c2", "c3"])["c4"].apply(len)
c1 c2 c3
a b c 2
l m n 1
Name: c4, dtype: int64

必须首先对它们进行分组,因为这三列中的每一列都构成一个唯一的主 key ,我需要其中最长的列表。每个组也有不同长度的列表,对于大多数组,其大小为 1,对于其他组,它可以高达 5。最终目标应该是像这样的新数据帧:

c1  c2  c3  c4
a b c ["d", "e"]
l m n ["o"]

最佳答案

这个怎么样:

df = pd.DataFrame(data =[["a", "b", "c", ["d", "e"]],
["a", "b", "c", ["e"]],
["l", "m", "n", ["o"]]],
columns = ["c1", "c2", "c3", "c4"])

df['len'] = df['c4'].apply(len)

max_groups = df[df.groupby(['c1', 'c2', 'c3'])['len'].transform(max) == df['len']]

我们添加一个与c4中列表的长度相对应的额外列,然后将数据帧过滤到c4长度与c4长度相同的记录分组的最大长度为c4。这将 max_groups 返回为:

  c1 c2 c3      c4  len
0 a b c [d, e] 2
2 l m n [o] 1

关于python - 删除没有最长列表的数据帧行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49432582/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com