gpt4 book ai didi

pandas - 获取从与 Pandas 数据框中的当前行相同的值开始的行列表

转载 作者:行者123 更新时间:2023-12-05 01:49:55 27 4
gpt4 key购买 nike

我有一个数据框,我想用一个新列来扩展它,如果它们完全包含行 string_value,它将包含/匹配所有 id 的列表

id  string_value
1 The quick brown fox
2 The quick brown fox jumps
3 The quick brown fox jumps over
4 The quick brown fox jumps over the lazy dog
5 The slow
6 The slow brown fox

期望的输出

id  string_value                                new_columns
1 The quick brown fox [2, 3, 4]
2 The quick brown fox jumps [3, 4]
3 The quick brown fox jumps over [4]
4 The quick brown fox jumps over the lazy dog []
5 The slow [6]
6 The slow brown fox []

谢谢

最佳答案

你不能轻易地将它向量化,但你可以使用自定义函数:

def accumulate(s):
ref = None
prev = s.index[0]
out = {}
for i, val in s.items():
if ref and val.startswith(ref):
tmp.append(prev)
else:
tmp = []
ref = val
prev = i
out[i] = tmp.copy()

# invert dictionary
out2 = {}
for v,l in out.items():
for k in l:
out2.setdefault(k, []).append(v)

return pd.Series(out2)

df['new_columns'] = df['id'].map(accumulate(df.set_index('id')['string_value'].sort_values()))

输出:

   id                                 string_value new_columns
0 1 The quick brown fox [2, 3, 4]
1 2 The quick brown fox jumps [3, 4]
2 3 The quick brown fox jumps over [4]
3 4 The quick brown fox jumps over the lazy dog NaN
4 5 The slow [6]
5 6 The slow brown fox NaN

空列表

要在输出中用空列表代替 NaN,请将“反转字典”代码更改为:

    # invert dictionary
out2 = {i: [] for i in s.index}
for v,l in out.items():
for k in l:
out2[k].append(v)

关于pandas - 获取从与 Pandas 数据框中的当前行相同的值开始的行列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73662294/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com