gpt4 book ai didi

python - 当范围未知时,Pandas 按值范围分组

转载 作者:行者123 更新时间:2023-12-01 06:22:12 24 4
gpt4 key购买 nike

当间隔未知时,我目前正在尝试提取 pandas 中的间隔范围。假设我有一个像这样的 df:

df = pd.DataFrame({'range': ['range1','range1','range1','range1','range1','range1','range1','range1','range1','range1','range1','range1','range1','range1',
'range2','range2','range2','range2','range2','range2','range2','range2',
'range3','range3','range3','range3','range3','range3','range3','range3','range3','range3','range3','range3','range3','range3','range3','range3'],
'pos1':[1,2,3,4,100,101,102,104,107,108,207,208,209,210,
10,11,12,50,51,52,54,55,
50,51,52,53,107,108,109,110,111,112,113,800,802,803,804,805]})

您可以看到,在每个范围内,数字总是增加,有时数字之间会有很大的跳跃。我最终只是将输出写入文件,因此我不需要将其作为数据框。我希望最终的输出是这样的

range1    1    4
range1 100 108
range1 207 210
range2 10 12
range2 50 55
range3 50 53
range3 107 113
range3 800 805

我尝试这样做(很难看),但我的输出缺少所有 range2最后一个范围 range1range3 .

ranges = []
tmp = []
for r1, r2, p1, p2 in zip(df['range'], df['range'][1:], df['pos1'], df['pos1'][1:]):
if r1 == r2 and (p1+10 > p2):
tmp.append(p1)
elif r1 == r2 and (p1+10 < p2):
tmp.append(p1)
ranges.append((r1, tmp))
tmp = []

f = open('ranges.txt', 'w')
for x in ranges:
f.write(x[0]+'\t'+str(min(x[1]))+'\t'+str(max(x[1]))+'\n')

输出:

range1  1       4
range1 100 108
range3 50 53
range3 107 113

最佳答案

类似这样的事情是否有效(您应该修改 print 命令以写入文件):

thresh = 10
s = df.groupby('range')['pos1'].diff().gt(thresh).cumsum()

for (r,g), d in df.groupby(['range',s])['pos1']:
print(r, list(d))

输出:

range1 [1, 2, 3, 4]
range1 [100, 101, 102, 104, 107, 108]
range1 [207, 208, 209, 210]
range2 [10, 11, 12]
range2 [50, 51, 52, 54, 55]
range3 [50, 51, 52, 53]
range3 [107, 108, 109, 110, 111, 112, 113]
range3 [800, 802, 803, 804, 805]

关于python - 当范围未知时,Pandas 按值范围分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60309613/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com