gpt4 book ai didi

python - pandas groupby 列值(按零值分割)

转载 作者:行者123 更新时间:2023-12-01 02:10:18 25 4
gpt4 key购买 nike

我有时间戳数据,我试图根据值是否大于 0 将数据集分解为“ block ”。我认为说明这一点的最佳方法是用一个示例......想象一下数据看起来像这样的数据(我手动输入了分组信息):

Timestamp, Value
2018-02-08 04:28:44, 0.0
2018-02-08 04:28:48, 0.0
2018-02-08 04:28:52, 0.5, group 1
2018-02-08 04:28:56, 0.5, group 1
2018-02-08 04:29:00, 5.3, group 1
2018-02-08 04:29:04, 5.3, group 1
2018-02-08 04:29:08, 5.3, group 1
2018-02-08 04:29:43, 4.7, group 1
2018-02-08 04:29:48, 4.7, group 1
2018-02-08 04:29:52, 3.7, group 1
2018-02-08 04:29:56, 3.7, group 1
2018-02-08 04:30:00, 2.3, group 1
2018-02-08 04:30:04, 2.3, group 1
2018-02-08 04:30:08, 2.3, group 1
2018-02-08 04:30:12, 0.0
2018-02-08 04:30:16, 0.0
2018-02-08 04:32:07, 0.0
2018-02-08 04:32:16, 0.0
2018-02-08 04:32:20, 2.1, group 2
2018-02-08 04:32:24, 2.1, group 2
2018-02-08 04:32:28, 2.1, group 2
2018-02-08 04:32:32, 4.7, group 2
2018-02-08 04:32:36, 4.7, group 2
2018-02-08 04:32:40, 9.0, group 2
2018-02-08 04:32:44, 9.0, group 2
2018-02-08 04:32:48, 9.0, group 2

...我想我可以使用groupby函数来做到这一点 - 只要我上面手动输入的信息分组存在)。我想问题是我如何将这样的时间序列分成这样的组? (应该指出可能有数百个或数千个这样的群体)。

理想情况下,会有某种迭代器可以吐出这些组 - (可能有一个?) - 但我只是不知道它叫什么,或者甚至不知道开始寻找什么! (或者实际上,如果我的问题标题应该更改)

提前致谢。

最佳答案

我认为您需要按条件进行更改并按 cumsum 创建组,然后添加 numpy.where替换为 NaNs:

#comapre equality, not equality of 0
m = df['Value'].eq(0)
df['g'] = np.where(m, np.nan, (df['Value'].shift(-1).ne(0) & m).cumsum())

或者:

#comapre greater, less/equal of 0
m = df['Value'].gt(0)
df['g'] = np.where(m, (df['Value'].shift(-1).le(0) & m).cumsum(), np.nan)

Timestamp Value g
0 2018-02-08 04:28:44 0.0 NaN
1 2018-02-08 04:28:48 0.0 NaN
2 2018-02-08 04:28:52 0.5 1.0
3 2018-02-08 04:28:56 0.5 1.0
4 2018-02-08 04:29:00 5.3 1.0
5 2018-02-08 04:29:04 5.3 1.0
6 2018-02-08 04:29:08 5.3 1.0
7 2018-02-08 04:29:43 4.7 1.0
8 2018-02-08 04:29:48 4.7 1.0
9 2018-02-08 04:29:52 3.7 1.0
10 2018-02-08 04:29:56 3.7 1.0
11 2018-02-08 04:30:00 2.3 1.0
12 2018-02-08 04:30:04 2.3 1.0
13 2018-02-08 04:30:08 2.3 1.0
14 2018-02-08 04:30:12 0.0 NaN
15 2018-02-08 04:30:16 0.0 NaN
16 2018-02-08 04:32:07 0.0 NaN
17 2018-02-08 04:32:16 0.0 NaN
18 2018-02-08 04:32:20 2.1 2.0
19 2018-02-08 04:32:24 2.1 2.0
20 2018-02-08 04:32:28 2.1 2.0
21 2018-02-08 04:32:32 4.7 2.0
22 2018-02-08 04:32:36 4.7 2.0
23 2018-02-08 04:32:40 9.0 2.0
24 2018-02-08 04:32:44 9.0 2.0
25 2018-02-08 04:32:48 9.0 2.0

此外,如果g列中的数字不重要,则只需要组:

m = df['Value'].eq(0)
df['g'] = np.where(m, np.nan, m.cumsum())
print (df)
Timestamp Value g
0 2018-02-08 04:28:44 0.0 NaN
1 2018-02-08 04:28:48 0.0 NaN
2 2018-02-08 04:28:52 0.5 2.0
3 2018-02-08 04:28:56 0.5 2.0
4 2018-02-08 04:29:00 5.3 2.0
5 2018-02-08 04:29:04 5.3 2.0
6 2018-02-08 04:29:08 5.3 2.0
7 2018-02-08 04:29:43 4.7 2.0
8 2018-02-08 04:29:48 4.7 2.0
9 2018-02-08 04:29:52 3.7 2.0
10 2018-02-08 04:29:56 3.7 2.0
11 2018-02-08 04:30:00 2.3 2.0
12 2018-02-08 04:30:04 2.3 2.0
13 2018-02-08 04:30:08 2.3 2.0
14 2018-02-08 04:30:12 0.0 NaN
15 2018-02-08 04:30:16 0.0 NaN
16 2018-02-08 04:32:07 0.0 NaN
17 2018-02-08 04:32:16 0.0 NaN
18 2018-02-08 04:32:20 2.1 6.0
19 2018-02-08 04:32:24 2.1 6.0
20 2018-02-08 04:32:28 2.1 6.0
21 2018-02-08 04:32:32 4.7 6.0
22 2018-02-08 04:32:36 4.7 6.0
23 2018-02-08 04:32:40 9.0 6.0
24 2018-02-08 04:32:44 9.0 6.0
25 2018-02-08 04:32:48 9.0 6.0

说明:

m = df['Value'].eq(0)
a = df['Value'].shift(-1).ne(0)
b = a & m
c = (a & m).cumsum()
d = np.where(m, np.nan, (df['Value'].shift(-1).ne(0) & m).cumsum())
df1 = pd.concat([df, m,a,b,c,pd.Series(d, index=df.index)], axis=1)
df1.columns = ['Timestamp','Value','==0','shifted != 0','chained by &','cumsum','out']
print (df1)
Timestamp Value ==0 shifted != 0 chained by & cumsum out
0 2018-02-08 04:28:44 0.0 True False False 0 NaN
1 2018-02-08 04:28:48 0.0 True True True 1 NaN
2 2018-02-08 04:28:52 0.5 False True False 1 1.0
3 2018-02-08 04:28:56 0.5 False True False 1 1.0
4 2018-02-08 04:29:00 5.3 False True False 1 1.0
5 2018-02-08 04:29:04 5.3 False True False 1 1.0
6 2018-02-08 04:29:08 5.3 False True False 1 1.0
7 2018-02-08 04:29:43 4.7 False True False 1 1.0
8 2018-02-08 04:29:48 4.7 False True False 1 1.0
9 2018-02-08 04:29:52 3.7 False True False 1 1.0
10 2018-02-08 04:29:56 3.7 False True False 1 1.0
11 2018-02-08 04:30:00 2.3 False True False 1 1.0
12 2018-02-08 04:30:04 2.3 False True False 1 1.0
13 2018-02-08 04:30:08 2.3 False False False 1 1.0
14 2018-02-08 04:30:12 0.0 True False False 1 NaN
15 2018-02-08 04:30:16 0.0 True False False 1 NaN
16 2018-02-08 04:32:07 0.0 True False False 1 NaN
17 2018-02-08 04:32:16 0.0 True True True 2 NaN
18 2018-02-08 04:32:20 2.1 False True False 2 2.0
19 2018-02-08 04:32:24 2.1 False True False 2 2.0
20 2018-02-08 04:32:28 2.1 False True False 2 2.0
21 2018-02-08 04:32:32 4.7 False True False 2 2.0
22 2018-02-08 04:32:36 4.7 False True False 2 2.0
23 2018-02-08 04:32:40 9.0 False True False 2 2.0
24 2018-02-08 04:32:44 9.0 False True False 2 2.0
25 2018-02-08 04:32:48 9.0 False True False 2 2.0

关于python - pandas groupby 列值(按零值分割),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48741911/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com