gpt4 book ai didi

python - 如何将 rdat/rdata xts 文件转换为 python pandas 原生时间序列文件?

转载 作者:太空宇宙 更新时间:2023-11-04 03:03:25 33 4
gpt4 key购买 nike

我有一个文件夹,其中包含 1000 多个股票数据的 rda 时间序列文件。下面是我在 rda 中保存我的时间序列(xts)文件所使用的示例代码。我使用 rda/rdata 而不是 csv,因为文件的保存和加载速度很快,而且与 csv 相比,rda 中的数据压缩也非常好。

library(quantmod)
AAPL <- getSymbols("AAPL",auto.assign=FALSE)
save(AAPL,file="/home/user/folder/AAPL.rda")

AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
2007-01-03 86.29 86.58 81.90 83.80 309579900 10.96015
2007-01-04 84.05 85.95 83.82 85.66 211815100 11.20341
2007-01-05 85.77 86.20 84.40 85.05 208685400 11.12363
2007-01-08 85.96 86.53 85.28 85.47 199276700 11.17857
2007-01-09 86.45 92.98 85.15 92.57 837324600 12.10717
2007-01-10 94.75 97.80 93.45 97.00 738220000 12.68657

我将这些文件用于我在 R 中的许多数据分析实验。但现在我正在慢慢迁移到 python(使用 pandas),因为它是一种通用语言。有没有办法将我当前的 rda xts 文件转换为 python pandas native 文件(h5 或 pickle,这是最好的格式),而不是再次下载所有股票数据。我该怎么做?

编辑

这是我用python做的

import rpy2.robjects as robjects
import pandas.rpy.common as com
import pandas as pd

robj=robjects.r['load']("AAPL.rda")


for sets in robj:
myRData = com.load_data(sets)
# convert to DataFrame
if not isinstance(myRData, pd.DataFrame):
myRData = pd.DataFrame(myRData)

print(myRData)

输出是

     AAPL.Open  AAPL.High   AAPL.Low  AAPL.Close  AAPL.Volume  AAPL.Adjusted
1.0 86.289999 86.579999 81.899999 83.800002 309579900.0 10.960147
2.0 84.050001 85.949998 83.820003 85.659998 211815100.0 11.203415
3.0 85.770000 86.199997 84.400002 85.049997 208685400.0 11.123633
4.0 85.959998 86.529998 85.280003 85.470000 199276700.0 11.178565
5.0 86.450003 92.979999 85.150000 92.570003 837324600.0 12.107169

在python中将其转换为非时间序列数据集。我应该如何将其转换为时间序列?

编辑 2:

经过多次搜索和修改,我走到了这一步。我试图将我的 rda 文件中的 UTC 变量转换为本地时间

import rpy2.robjects as robjects
import pandas.rpy.common as com
import pandas as pd
import numpy as np

robj=robjects.r['load']("AAPL.rda")

myRData=None
for sets in robj:
myRData = com.load_data(sets)
# convert to DataFrame
if not isinstance(myRData, pd.DataFrame):
myRData = pd.DataFrame(myRData)

myRData.head(10)
ts=np.array(robjects.r('attr(AAPL,"index")')).astype(int)

#changing index
myRData.index=pd.to_datetime(ts, utc=True, format='%Y-%m-%d')

myRData.tail(10)

现在的问题是转换后的本地时间索引格式不正确。尾部应包含最近日期的时间序列,而不是停留在 1970 年。

                                     AAPL.Close  AAPL.Volume  AAPL.Adjusted  
1970-01-01 00:00:01.476144+00:00 116.300003 64041000.0 116.300003
1970-01-01 00:00:01.476230400+00:00 117.339996 37586800.0 117.339996
1970-01-01 00:00:01.476316800+00:00 116.980003 35192400.0 116.980003
1970-01-01 00:00:01.476403200+00:00 117.629997 35652200.0 117.629997
1970-01-01 00:00:01.476662400+00:00 117.550003 23624900.0 117.550003
1970-01-01 00:00:01.476748800+00:00 117.470001 24553500.0 117.470001
1970-01-01 00:00:01.476835200+00:00 117.120003 20034600.0 117.120003
1970-01-01 00:00:01.476921600+00:00 117.059998 24125800.0 117.059998
1970-01-01 00:00:01.477008+00:00 116.599998 23192700.0 116.599998
1970-01-01 00:00:01.477267200+00:00 117.650002 23311700.0 117.650002

最佳答案

经过数小时的编辑和搜索。我用结果代码解决了我的问题。欢迎任何建议

import rpy2.robjects as robjects
import pandas.rpy.common as com
import pandas as pd
import numpy as np
from datetime import datetime

#loading external rda file
robj=robjects.r['load']("AAPL.rda")

myRData=None
for sets in robj:
myRData = com.load_data(sets)
# convert to DataFrame
if not isinstance(myRData, pd.DataFrame):
myRData = pd.DataFrame(myRData)

myRData.tail(10)

#fetching UTC data from rda file
ts=np.array(robjects.r('attr(AAPL,"index")')).astype(int)

#converting UTC to local time
d= np.array([])
for t in ts:
s=datetime.utcfromtimestamp(t)
d=np.append(s,d)

#sorting datetime
d=np.sort(d, axis=0)

#changing index
myRData.index=pd.to_datetime(d)

myRData.tail(10)

结果

             AAPL.Open   AAPL.High    AAPL.Low  AAPL.Close  AAPL.Volume  \
2016-10-11 117.699997 118.690002 116.199997 116.300003 64041000.0
2016-10-12 117.349998 117.980003 116.750000 117.339996 37586800.0
2016-10-13 116.790001 117.440002 115.720001 116.980003 35192400.0
2016-10-14 117.879997 118.169998 117.129997 117.629997 35652200.0
2016-10-17 117.330002 117.839996 116.779999 117.550003 23624900.0
2016-10-18 118.180000 118.209999 117.449997 117.470001 24553500.0
2016-10-19 117.250000 117.760002 113.800003 117.120003 20034600.0
2016-10-20 116.860001 117.379997 116.330002 117.059998 24125800.0
2016-10-21 116.809998 116.910004 116.279999 116.599998 23192700.0
2016-10-24 117.099998 117.739998 117.000000 117.650002 23311700.0

关于python - 如何将 rdat/rdata xts 文件转换为 python pandas 原生时间序列文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40222978/

33 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com