gpt4 book ai didi

python - 按日期字符串选择 DataFrame 切片

转载 作者:行者123 更新时间:2023-11-28 21:55:31 26 4
gpt4 key购买 nike

我有一个像这样加载的 DataFrame

        minData = pd.read_csv(
currentSymbol["fullpath"],
header = None,
names = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Split Factor', 'Earnings', 'Dividends'],
parse_dates = [["Date", "Time"]],
date_parser = lambda x : datetime.datetime.strptime(x, '%Y%m%d %H%M'),
index_col = "Date_Time",
sep=' ')

数据是这样的

>>> minData.index
<class 'pandas.tseries.index.DatetimeIndex'>
[1998-01-02 09:30:00, ..., 2013-12-09 16:00:00]
Length: 1373036, Freq: None, Timezone: None
>>>

>>> minData.head(5)
Open High Low Close Volume \
Date_Time
1998-01-02 09:30:00 8.70630 8.70630 8.70630 8.70630 420.73
1998-01-02 09:35:00 8.82514 8.82514 8.82514 8.82514 420.73
1998-01-02 09:42:00 8.79424 8.79424 8.79424 8.79424 420.73
1998-01-02 09:43:00 8.76572 8.76572 8.76572 8.76572 1262.19
1998-01-02 09:44:00 8.76572 8.76572 8.76572 8.76572 420.73

Split Factor Earnings Dividends Active
Date_Time
1998-01-02 09:30:00 4 0 0 NaN
1998-01-02 09:35:00 4 0 0 NaN
1998-01-02 09:42:00 4 0 0 NaN
1998-01-02 09:43:00 4 0 0 NaN
1998-01-02 09:44:00 4 0 0 NaN

[5 rows x 9 columns]

我可以像这样从我的 DataFrame 中选择行

>>> minData["2004-12-20"]
Open High Low Close Volume \
Date_Time
2004-12-20 09:30:00 35.8574 35.9373 35.8025 35.9273 154112.00
2004-12-20 09:31:00 35.8924 35.9174 35.8824 35.8874 17021.50
2004-12-20 09:32:00 35.8874 35.8924 35.8824 35.8824 17079.50
2004-12-20 09:33:00 35.8874 35.9423 35.8724 35.9373 32491.50
2004-12-20 09:34:00 35.9373 36.0023 35.9174 36.0023 40096.40
2004-12-20 09:35:00 35.9923 36.2071 35.9923 36.1471 67088.90
...

我有这样的日期(从不同的文件读取)

>>> ts
Timestamp('2004-12-20 00:00:00', tz=None)
>>>

我想将这一天所有分钟的“事件”列设置为 True。

我可以用这个来做

minData.loc['2004-12-20',"Active"] = True

我可以用这段疯狂的代码对我的时间戳日期做同样的事情

minData.loc[str(ts.year) + "-" + str(ts.month) + "-" + str(ts.day),"Active"] = True

是的,这就是从 TimeStamp 对象创建一个字符串!

我知道一定有更好的方法来做到这一点..

最佳答案

我真的会这样做

In [20]: df = DataFrame(np.random.randn(10,1),index=date_range('20130101 23:55:00',periods=10,freq='T'))

In [21]: df['Active'] = False

In [22]: df
Out[22]:
0 Active
2013-01-01 23:55:00 0.273194 False
2013-01-01 23:56:00 2.869795 False
2013-01-01 23:57:00 0.980566 False
2013-01-01 23:58:00 0.176711 False
2013-01-01 23:59:00 -0.354976 False
2013-01-02 00:00:00 0.258194 False
2013-01-02 00:01:00 -1.765781 False
2013-01-02 00:02:00 0.106163 False
2013-01-02 00:03:00 -1.169214 False
2013-01-02 00:04:00 0.224484 False

[10 rows x 2 columns]


In [28]: df['Active'] = False

正如@Andy Hayden 指出的那样,normalize 将时间设置为 0,以便您可以直接与时间为 0 的时间戳进行比较。

In [34]: df.loc[df.index.normalize() == Timestamp('20130102'),'Active'] = True

In [35]: df
Out[35]:
0 Active
2013-01-01 23:55:00 0.273194 False
2013-01-01 23:56:00 2.869795 False
2013-01-01 23:57:00 0.980566 False
2013-01-01 23:58:00 0.176711 False
2013-01-01 23:59:00 -0.354976 False
2013-01-02 00:00:00 0.258194 True
2013-01-02 00:01:00 -1.765781 True
2013-01-02 00:02:00 0.106163 True
2013-01-02 00:03:00 -1.169214 True
2013-01-02 00:04:00 0.224484 True

[10 rows x 2 columns]

要获得真正精细的控制,请执行此操作(如果您只想将 times 作为索引器,则可以使用 indexer_at_time)。并且您始终可以使用 and 子句来进行更复杂的索引。

In [29]: df.loc[df.index.indexer_between_time('20130101 23:59:00','20130102 00:03:00'),'Active'] = True

In [30]: df
Out[30]:
0 Active
2013-01-01 23:55:00 0.273194 False
2013-01-01 23:56:00 2.869795 False
2013-01-01 23:57:00 0.980566 False
2013-01-01 23:58:00 0.176711 False
2013-01-01 23:59:00 -0.354976 True
2013-01-02 00:00:00 0.258194 True
2013-01-02 00:01:00 -1.765781 True
2013-01-02 00:02:00 0.106163 True
2013-01-02 00:03:00 -1.169214 True
2013-01-02 00:04:00 0.224484 False

[10 rows x 2 columns]

关于python - 按日期字符串选择 DataFrame 切片,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22699184/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com