gpt4 book ai didi

python - pandas 行中一系列零之前和之后的事件百分比

转载 作者:行者123 更新时间:2023-12-02 05:17:51 24 4
gpt4 key购买 nike

我有一个如下所示的数据框:

        ID      0   1   2   3   4   5   6   7   8   ... 81  82  83  84  85  86  87  88  89  90  total
-----------------------------------------------------------------------------------------------------
0 A 2 21 0 18 3 0 0 0 2 ... 0 0 0 0 0 0 0 0 0 0 156
1 B 0 20 12 2 0 8 14 23 0 ... 0 0 0 0 0 0 0 0 0 0 231
2 C 0 38 19 3 1 3 3 7 1 ... 0 0 0 0 0 0 0 0 0 0 78
3 D 3 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 5

我想知道每行出现第一个长度为 n 的零序列之前和之后的事件百分比(单元格中的数字)。这个问题始于此处发现的另一个问题:Length of first sequence of zeros of given size after certain column in pandas dataframe ,我正在尝试修改代码来执行我需要的操作,但我不断收到错误,并且似乎找不到正确的方法。这是我尝试过的:

def func(row, n):
"""Returns the number of events before the
first sequence of 0s of length n is found
"""

idx = np.arange(0, 91)

a = row[idx]
b = (a != 0).cumsum()
c = b[a == 0]
d = c.groupby(c).count()

#in case there is no sequence of 0s with length n
try:
e = c[c >= d.index[d >= n][0]]
f = str(e.index[0])
except IndexError:
e = [90]
f = str(e[0])

idx_sliced = np.arange(0, int(f)+1)
a = row[idx_sliced]

if (int(f) + n > 90):
perc_before = 100
else:
perc_before = a.cumsum().tail(1).values[0]/row['total']

return perc_before

事实上,我得到的错误是:

---> perc_before = a.cumsum().tail(1).values[0]/row['total']
TypeError: ('must be str, not int', 'occurred at index 0')

最后,我将此函数应用于数据帧并返回一个新列,其中包含每行中第一个 n 0 序列之前的事件百分比,如下所示:

        ID      0   1   2   3   4   5   6   7   8   ... 81  82  83  84  85  86  87  88  89  90  total  %_before
---------------------------------------------------------------------------------------------------------------
0 A 2 21 0 18 3 0 0 0 2 ... 0 0 0 0 0 0 0 0 0 0 156 43
1 B 0 20 12 2 0 8 14 23 0 ... 0 0 0 0 0 0 0 0 0 0 231 21
2 C 0 38 19 3 1 3 3 7 1 ... 0 0 0 0 0 0 0 0 0 0 78 90
3 D 3 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 5 100

如果尝试解决此问题,您可以使用以下示例输入进行测试:

a = pd.Series([1,1,13,0,0,0,4,0,0,0,0,0,12,1,1])
b = pd.Series([1,1,13,0,0,0,4,12,1,12,3,0,0,5,1])
c = pd.Series([1,1,13,0,0,0,4,12,2,0,5,0,5,1,1])
d = pd.Series([1,1,13,0,0,0,4,12,1,12,4,50,0,0,1])
e = pd.Series([1,1,13,0,0,0,4,12,0,0,0,54,0,1,1])

df = pd.DataFrame({'0':a, '1':b, '2':c, '3':d, '4':e})
df = df.transpose()

最佳答案

尝试一下:

def percent_before(row, n, ncols):
"""Return the percentage of activities happen before
the first sequence of at least `n` consecutive 0s
"""
start_index, i, size = 0, 0, 0
for i in range(ncols):
if row[i] == 0:
# increase the size of the island
size += 1
elif size >= n:
# found the island we want
break
else:
# start a new island
# row[start_index] is always non-zero
start_index = i
size = 0

if size < n:
# didn't find the island we want
return 1
else:
# get the sum of activities that happen
# before the island
idx = np.arange(0, start_index + 1).astype(str)
return row.loc[idx].sum() / row['total']

df['percent_before'] = df.apply(percent_before, n=3, ncols=15, axis=1)

结果:

   0  1   2  3  4  5  6   7  8   9  10  11  12  13  14  total  percent_before
0 1 1 13 0 0 0 4 0 0 0 0 0 12 1 1 33 0.454545
1 1 1 13 0 0 0 4 12 1 12 3 0 0 5 1 53 0.283019
2 1 1 13 0 0 0 4 12 2 0 5 0 5 1 1 45 0.333333
3 1 1 13 0 0 0 4 12 1 12 4 50 0 0 1 99 0.151515
4 1 1 13 0 0 0 4 12 0 0 0 54 0 1 1 87 0.172414

对于全帧,请使用 ncols=91 调用 apply

关于python - pandas 行中一系列零之前和之后的事件百分比,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59618058/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com