gpt4 book ai didi

python - 更改 Pandas 中日期时间列的时区并添加为分层索引

转载 作者:IT老高 更新时间:2023-10-28 22:08:31 26 4
gpt4 key购买 nike

我有带有 UTC 时间戳的数据。我想将此时间戳的时区转换为“美国/太平洋”,并将其作为分层索引添加到 pandas DataFrame。我已经能够将时间戳转换为索引,但是当我尝试将它作为列或索引添加回 DataFrame 时,它​​会丢失时区格式。

>>> import pandas as pd
>>> dat = pd.DataFrame({'label':['a', 'a', 'a', 'b', 'b', 'b'], 'datetime':['2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00', '2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00'], 'value':range(6)})
>>> dat.dtypes
#datetime object
#label object
#value int64
#dtype: object

现在,如果我尝试直接转换系列,我会遇到错误。

>>> times = pd.to_datetime(dat['datetime'])
>>> times.tz_localize('UTC')
#Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "/Users/erikshilts/workspace/schedule-detection/python/pysched/env/lib/python2.7/site-packages/pandas/core/series.py", line 3170, in tz_localize
# raise Exception('Cannot tz-localize non-time series')
#Exception: Cannot tz-localize non-time series

如果我将其转换为索引,那么我可以将其作为时间序列进行操作。请注意,索引现在具有太平洋时区。

>>> times_index = pd.Index(times)
>>> times_index_pacific = times_index.tz_localize('UTC').tz_convert('US/Pacific')
>>> times_index_pacific
#<class 'pandas.tseries.index.DatetimeIndex'>
#[2011-07-19 00:00:00, ..., 2011-07-19 02:00:00]
#Length: 6, Freq: None, Timezone: US/Pacific

但是,现在我在将索引添加回数据帧时遇到了问题,因为它失去了时区格式:

>>> dat_index = dat.set_index([dat['label'], times_index_pacific])
>>> dat_index
# datetime label value
#label
#a 2011-07-19 07:00:00 2011-07-19 07:00:00 a 0
# 2011-07-19 08:00:00 2011-07-19 08:00:00 a 1
# 2011-07-19 09:00:00 2011-07-19 09:00:00 a 2
#b 2011-07-19 07:00:00 2011-07-19 07:00:00 b 3
# 2011-07-19 08:00:00 2011-07-19 08:00:00 b 4
# 2011-07-19 09:00:00 2011-07-19 09:00:00 b 5

您会注意到索引返回到 UTC 时区,而不是转换后的太平洋时区。

如何更改时区并将其作为索引添加到 DataFrame?

最佳答案

如果设置为索引,它会自动转换为索引:

In [11]: dat.index = pd.to_datetime(dat.pop('datetime'), utc=True)

In [12]: dat
Out[12]:
label value
datetime
2011-07-19 07:00:00 a 0
2011-07-19 08:00:00 a 1
2011-07-19 09:00:00 a 2
2011-07-19 07:00:00 b 3
2011-07-19 08:00:00 b 4
2011-07-19 09:00:00 b 5

然后执行tz_localize:

In [12]: dat.index = dat.index.tz_localize('UTC').tz_convert('US/Pacific')

In [13]: dat
Out[13]:
label value
datetime
2011-07-19 00:00:00-07:00 a 0
2011-07-19 01:00:00-07:00 a 1
2011-07-19 02:00:00-07:00 a 2
2011-07-19 00:00:00-07:00 b 3
2011-07-19 01:00:00-07:00 b 4
2011-07-19 02:00:00-07:00 b 5

然后你可以将标签列追加到索引中:

嗯,这绝对是一个错误!

In [14]: dat.set_index('label', append=True).swaplevel(0, 1)
Out[14]:
value
label datetime
a 2011-07-19 07:00:00 0
2011-07-19 08:00:00 1
2011-07-19 09:00:00 2
b 2011-07-19 07:00:00 3
2011-07-19 08:00:00 4
2011-07-19 09:00:00 5

一个 hacky 解决方法是直接转换 (datetime) 级别(当它已经是 MultiIndex 时):

In [15]: dat.index.levels[1] = dat.index.get_level_values(1).tz_localize('UTC').tz_convert('US/Pacific')

In [16]: dat1
Out[16]:
value
label datetime
a 2011-07-19 00:00:00-07:00 0
2011-07-19 01:00:00-07:00 1
2011-07-19 02:00:00-07:00 2
b 2011-07-19 00:00:00-07:00 3
2011-07-19 01:00:00-07:00 4
2011-07-19 02:00:00-07:00 5

关于python - 更改 Pandas 中日期时间列的时区并添加为分层索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17159207/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com