gpt4 book ai didi

pandas - 对多个数据帧进行数据分析?面板还是多索引?

转载 作者:行者123 更新时间:2023-12-02 09:55:35 24 4
gpt4 key购买 nike

当我使用 web.DataReader 提取多个股票的数据时,我得到一个面板作为输出。

import numpy as np
import pandas as pd
import pandas_datareader.data as web
import matplotlib.pyplot as plt
import datetime as dt
import re



startDate = '2010-01-01'
endDate = '2016-09-07'
stocks_query = ['AAPL','OPK']


stocks = web.DataReader(stocks_query, data_source='yahoo',
start=startDate, end=endDate)
stocks = stocks.swapaxes('items','minor_axis')

导致输出:

Dimensions: 2 (items) x 1682 (major_axis) x 6 (minor_axis)
Items axis: AAPL to OPK
Major_axis axis: 2010-01-04 00:00:00 to 2016-09-07 00:00:00
Minor_axis axis: Open to Adj Close

面板的单个数据框如下所示

股票['OPK']

            Open  High   Low  Close      Volume  Adj Close  log_return  \
Date
2010-01-04 1.80 1.97 1.76 1.95 234500.0 1.95 NaN
2010-01-05 1.64 1.95 1.64 1.93 135800.0 1.93 -0.010309
2010-01-06 1.90 1.92 1.77 1.79 546600.0 1.79 -0.075304
2010-01-07 1.79 1.94 1.76 1.92 138700.0 1.92 0.070110
2010-01-08 1.92 1.94 1.86 1.89 62500.0 1.89 -0.015748

我计划在所有数据帧中进行大量数据操作,添加新列。比较两个数据帧等。建议我研究 multi_indexing,因为面板已被弃用。

这是我第一次使用面板。如果我想向两个数据帧(AAPL、OPK)添加新列,我必须这样做:

for i in stocks:
stocks[i]['log_return'] = np.log(stocks[i]['Close']/(stocks[i]['Close'].shift(1)))

如果确实建议使用 multi_indexing 来处理多个数据帧,那么我该如何将数据帧转换为我可以轻松使用的形式?我是否会有一个主要指数,下一级是股票,并且各列将包含在每只股票中?

我浏览了文档,其中给出了许多使用我没有得到的元组的示例或使用单个数据帧的示例。 http://pandas.pydata.org/pandas-docs/stable/advanced.html

那么我到底如何将面板转换为 multi_index 数据框?

最佳答案

我想延长@piRSquared's answer举一些例子:

In [40]: stocks.to_frame()
Out[40]:
AAPL OPK
Date minor
2010-01-04 Open 2.134300e+02 1.80
High 2.145000e+02 1.97
Low 2.123800e+02 1.76
Close 2.140100e+02 1.95
Volume 1.234324e+08 234500.00
Adj Close 2.772704e+01 1.95
2010-01-05 Open 2.146000e+02 1.64
High 2.155900e+02 1.95
Low 2.132500e+02 1.64
Close 2.143800e+02 1.93
... ... ...
2016-09-06 Low 1.075100e+02 9.19
Close 1.077000e+02 9.36
Volume 2.688040e+07 3026900.00
Adj Close 1.066873e+02 9.36
2016-09-07 Open 1.078300e+02 9.39
High 1.087600e+02 9.60
Low 1.070700e+02 9.38
Close 1.083600e+02 9.59
Volume 4.236430e+07 2632400.00
Adj Close 1.073411e+02 9.59

[10092 rows x 2 columns]

但如果你想将其转换为 MultiIndex DF,最好保留原来的 pandas_datareader 面板:

In [38]: p = web.DataReader(stocks_query, data_source='yahoo', start=startDate, end=endDate)

In [39]: p.to_frame()
Out[39]:
Open High Low Close Volume Adj Close
Date minor
2010-01-04 AAPL 213.429998 214.499996 212.380001 214.009998 123432400.0 27.727039
OPK 1.800000 1.970000 1.760000 1.950000 234500.0 1.950000
2010-01-05 AAPL 214.599998 215.589994 213.249994 214.379993 150476200.0 27.774976
OPK 1.640000 1.950000 1.640000 1.930000 135800.0 1.930000
2010-01-06 AAPL 214.379993 215.230000 210.750004 210.969995 138040000.0 27.333178
OPK 1.900000 1.920000 1.770000 1.790000 546600.0 1.790000
2010-01-07 AAPL 211.750000 212.000006 209.050005 210.580000 119282800.0 27.282650
OPK 1.790000 1.940000 1.760000 1.920000 138700.0 1.920000
2010-01-08 AAPL 210.299994 212.000006 209.060005 211.980005 111902700.0 27.464034
OPK 1.920000 1.940000 1.860000 1.890000 62500.0 1.890000
... ... ... ... ... ... ...
2016-08-31 AAPL 105.660004 106.570000 105.639999 106.099998 29662400.0 105.102360
OPK 9.260000 9.260000 9.070000 9.100000 2793300.0 9.100000
2016-09-01 AAPL 106.139999 106.800003 105.620003 106.730003 26701500.0 105.726441
OPK 9.310000 9.540000 9.190000 9.290000 3515300.0 9.290000
2016-09-02 AAPL 107.699997 108.000000 106.820000 107.730003 26802500.0 106.717038
OPK 9.340000 9.390000 9.160000 9.330000 2061200.0 9.330000
2016-09-06 AAPL 107.900002 108.300003 107.510002 107.699997 26880400.0 106.687314
OPK 9.320000 9.480000 9.190000 9.360000 3026900.0 9.360000
2016-09-07 AAPL 107.830002 108.760002 107.070000 108.360001 42364300.0 107.341112
OPK 9.390000 9.600000 9.380000 9.590000 2632400.0 9.590000

[3364 rows x 6 columns]

如何使用多索引 DF:

In [46]: df = p.to_frame()

In [47]: df.loc[pd.IndexSlice[:, ['AAPL']], :]
Out[47]:
Open High Low Close Volume Adj Close
Date minor
2010-01-04 AAPL 213.429998 214.499996 212.380001 214.009998 123432400.0 27.727039
2010-01-05 AAPL 214.599998 215.589994 213.249994 214.379993 150476200.0 27.774976
2010-01-06 AAPL 214.379993 215.230000 210.750004 210.969995 138040000.0 27.333178
2010-01-07 AAPL 211.750000 212.000006 209.050005 210.580000 119282800.0 27.282650
2010-01-08 AAPL 210.299994 212.000006 209.060005 211.980005 111902700.0 27.464034
2010-01-11 AAPL 212.799997 213.000002 208.450005 210.110003 115557400.0 27.221758
2010-01-12 AAPL 209.189995 209.769995 206.419998 207.720001 148614900.0 26.912110
2010-01-13 AAPL 207.870005 210.929995 204.099998 210.650002 151473000.0 27.291720
2010-01-14 AAPL 210.110003 210.459997 209.020004 209.430000 108223500.0 27.133657
2010-01-15 AAPL 210.929995 211.599997 205.869999 205.930000 148516900.0 26.680198
... ... ... ... ... ... ...
2016-08-24 AAPL 108.570000 108.750000 107.680000 108.029999 23675100.0 107.014213
2016-08-25 AAPL 107.389999 107.879997 106.680000 107.570000 25086200.0 106.558539
2016-08-26 AAPL 107.410004 107.949997 106.309998 106.940002 27766300.0 105.934466
2016-08-29 AAPL 106.620003 107.440002 106.290001 106.820000 24970300.0 105.815591
2016-08-30 AAPL 105.800003 106.500000 105.500000 106.000000 24863900.0 105.003302
2016-08-31 AAPL 105.660004 106.570000 105.639999 106.099998 29662400.0 105.102360
2016-09-01 AAPL 106.139999 106.800003 105.620003 106.730003 26701500.0 105.726441
2016-09-02 AAPL 107.699997 108.000000 106.820000 107.730003 26802500.0 106.717038
2016-09-06 AAPL 107.900002 108.300003 107.510002 107.699997 26880400.0 106.687314
2016-09-07 AAPL 107.830002 108.760002 107.070000 108.360001 42364300.0 107.341112

[1682 rows x 6 columns]

关于pandas - 对多个数据帧进行数据分析?面板还是多索引?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43054570/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com