gpt4 book ai didi

python - Pandas 向量化中的复杂条件

转载 作者:行者123 更新时间:2023-12-04 07:53:52 26 4
gpt4 key购买 nike

我对 Python 和 Pandas 还很陌生,所以我还在学习。我有一个包含一堆 OHLCV 数据的 Dataframe,它被加载到一个 Dataframe 中

                                          Timestamp       Open       High     Low   Close      Volume  Trades
Timestamp
2015-08-07 14:03:00+00:00 2015-08-07 14:03:00+00:00 3.00000 3.00000 3.00 3.00 81.857278 2
2015-08-07 17:19:00+00:00 2015-08-07 17:19:00+00:00 3.00001 3.00001 3.00 3.00 42.073291 2
2015-08-08 06:43:00+00:00 2015-08-08 06:43:00+00:00 3.00000 3.00000 3.00 3.00 0.400000 1
2015-08-08 09:31:00+00:00 2015-08-08 09:31:00+00:00 2.00000 2.00000 2.00 2.00 125.000000 2
2015-08-08 16:30:00+00:00 2015-08-08 16:30:00+00:00 1.20000 1.20000 1.20 1.20 54.759700 1
... ... ... ... ... ... ... ...
2020-12-31 23:55:00+00:00 2020-12-31 23:55:00+00:00 738.49000 738.49000 738.49 738.49 0.748789 3
2020-12-31 23:56:00+00:00 2020-12-31 23:56:00+00:00 738.07000 738.07000 737.72 737.72 2.491733 8
2020-12-31 23:57:00+00:00 2020-12-31 23:57:00+00:00 738.15000 738.15000 737.94 738.05 56.043875 9
2020-12-31 23:58:00+00:00 2020-12-31 23:58:00+00:00 738.14000 738.15000 737.55 737.75 80.826279 16
2020-12-31 23:59:00+00:00 2020-12-31 23:59:00+00:00 737.01000 737.60000 737.01 737.45 3.129885 8
如您所见,由于该分钟内没有实际交易量,因此某些条目之间缺少几分钟。我想要做的是创建一个具有前一分钟交易量的新列,如果前一分钟没有行,则该值应为 0。
我试图使用以下语句:
df.loc[
(df['Timestamp'].shift().notna() == True) &
(
((df['Timestamp'].shift().dt.minute.astype(int) == (df['Timestamp'].dt.minute.astype(int) - 1)) & (df['Timestamp'].dt.minute.astype(int) >= 2)) |
((df['Timestamp'].shift().dt.minute.astype(int) == 59) & (df['Timestamp'].dt.minute.astype(int) >= 2))
), 'previousVolume'] = df['Volume'].shift().astype(float)
用于添加 0 ■ 我会检查上述是否定的。
我得到的错误如下:
Traceback (most recent call last):
File "engulfing_test.py", line 145, in <module>
), 'previousVolume'] = df['Volume'].shift().astype(float)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/generic.py", line 5877, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 631, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 427, in apply
applied = getattr(b, f)(**kwargs)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 673, in astype
values = astype_nansafe(vals1d, dtype, copy=True)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 1068, in astype_nansafe
raise ValueError("Cannot convert non-finite values (NA or inf) to integer")
ValueError: Cannot convert non-finite values (NA or inf) to integer
我知道像这样工作的复杂语句,导致做这样的事情:
df.loc[(df['Timestamp'].dt.minute.astype(int) > 10) & (df['Timestamp'].dt.minute.astype(int) < 57), 'gt10lt57'] = 'Yes'
非常感谢任何帮助
编辑
添加 print(df.head().to_dict()) 的输出
{'Timestamp': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): Timestamp('2015-08-07 14:03:00+0000', tz='UTC'), Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): Timestamp('2015-08-07 17:19:00+0000', tz='UTC'), Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): Timestamp('2015-08-08 06:43:00+0000', tz='UTC'), Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): Timestamp('2015-08-08 09:31:00+0000', tz='UTC'), Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): Timestamp('2015-08-08 16:30:00+0000', tz='UTC')}, 'Open': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.00001, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'High': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.00001, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'Low': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'Close': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'Volume': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 81.85727776, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 42.07329055, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 0.4, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 125.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 54.7597}, 'Trades': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 2, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 2, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 1, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1}}

最佳答案

asfreq首先,去掉重复的Timestamp与索引相同的列。
将索引留在原处。 asfreq将改变索引的频率。

df.drop('Timestamp', axis=1).asfreq('min', fill_value=0)

Open High Low Close Volume Trades
Timestamp
2015-08-07 14:03:00+00:00 3.00 3.00 3.00 3.00 81.857278 2
2015-08-07 14:04:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
2015-08-07 14:05:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
2015-08-07 14:06:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
2015-08-07 14:07:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
... ... ... ... ... ... ...
2020-12-31 23:55:00+00:00 738.49 738.49 738.49 738.49 0.748789 3
2020-12-31 23:56:00+00:00 738.07 738.07 737.72 737.72 2.491733 8
2020-12-31 23:57:00+00:00 738.15 738.15 737.94 738.05 56.043875 9
2020-12-31 23:58:00+00:00 738.14 738.15 737.55 737.75 80.826279 16
2020-12-31 23:59:00+00:00 737.01 737.60 737.01 737.45 3.129885 8

[2841717 rows x 6 columns]

关于python - Pandas 向量化中的复杂条件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66805048/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com