gpt4 book ai didi

python - 基于列标签 DatetimeIndex 组合 DataFrame

转载 作者:太空宇宙 更新时间:2023-11-03 17:30:32 25 4
gpt4 key购买 nike

我将天气数据存储在许多单独的文件中,其中列用于特定的测量仪器,每行对应于特定日期的平均读数。假设一个文件如下所示:

first = pd.DataFrame(np.random.random((10,3)), 
pd.date_range('1950-01-01', periods=10),
columns=['A','B','C'])

first
Out[21]:
A B C
1950-01-01 0.939932 0.504543 0.091025
1950-01-02 0.121418 0.725333 0.444813
1950-01-03 0.338385 0.783398 0.116468
1950-01-04 0.847905 0.846147 0.226074
1950-01-05 0.156315 0.704804 0.524886
1950-01-06 0.412284 0.425379 0.427246
1950-01-07 0.165859 0.406347 0.114586
1950-01-08 0.392670 0.789526 0.174001
1950-01-09 0.246180 0.776304 0.019368
1950-01-10 0.142213 0.731748 0.954076

第二个看起来像这样,

second = pd.DataFrame(np.random.random((10,3)), 
pd.date_range('1950-01-11', periods=10),
columns=['A','B','D'])



second
Out[30]:
A B D
1950-01-11 0.190767 0.905640 0.325411
1950-01-12 0.109964 0.754694 0.414402
1950-01-13 0.058164 0.305405 0.768333
1950-01-14 0.267644 0.919876 0.631083
1950-01-15 0.981333 0.454678 0.533075
1950-01-16 0.831600 0.823845 0.980366
1950-01-17 0.303585 0.091634 0.338517
1950-01-18 0.723445 0.088020 0.570779
1950-01-19 0.639665 0.954577 0.763810
1950-01-20 0.370629 0.716066 0.628383

我想将这两个合并在一起,以便所有仪器(即 A、B、C、D...)可以显示在具有所有测量时间段的同一文件中。预期结果如下所示:

                   A         B         C         D
1950-01-01 0.939932 0.504543 0.091025
1950-01-02 0.121418 0.725333 0.444813
1950-01-03 0.338385 0.783398 0.116468
1950-01-04 0.847905 0.846147 0.226074
1950-01-05 0.156315 0.704804 0.524886
1950-01-06 0.412284 0.425379 0.427246
1950-01-07 0.165859 0.406347 0.114586
1950-01-08 0.392670 0.789526 0.174001
1950-01-09 0.246180 0.776304 0.019368
1950-01-10 0.142213 0.731748 0.954076
1950-01-11 0.190767 0.905640 0.325411
1950-01-12 0.109964 0.754694 0.414402
1950-01-13 0.058164 0.305405 0.768333
1950-01-14 0.267644 0.919876 0.631083
1950-01-15 0.981333 0.454678 0.533075
1950-01-16 0.831600 0.823845 0.980366
1950-01-17 0.303585 0.091634 0.338517
1950-01-18 0.723445 0.088020 0.570779
1950-01-19 0.639665 0.954577 0.763810
1950-01-20 0.370629 0.716066 0.628383

为了得到这个,我尝试过:

first.merge(second, how='outer', left_index=True, right_index=True)
Out[34]:
A_x B_x C A_y B_y D
1950-01-01 0.939932 0.504543 0.091025 NaN NaN NaN
1950-01-02 0.121418 0.725333 0.444813 NaN NaN NaN
1950-01-03 0.338385 0.783398 0.116468 NaN NaN NaN
1950-01-04 0.847905 0.846147 0.226074 NaN NaN NaN
1950-01-05 0.156315 0.704804 0.524886 NaN NaN NaN
1950-01-06 0.412284 0.425379 0.427246 NaN NaN NaN
1950-01-07 0.165859 0.406347 0.114586 NaN NaN NaN
1950-01-08 0.392670 0.789526 0.174001 NaN NaN NaN
1950-01-09 0.246180 0.776304 0.019368 NaN NaN NaN
1950-01-10 0.142213 0.731748 0.954076 NaN NaN NaN
1950-01-11 NaN NaN NaN 0.190767 0.905640 0.325411
1950-01-12 NaN NaN NaN 0.109964 0.754694 0.414402
1950-01-13 NaN NaN NaN 0.058164 0.305405 0.768333
1950-01-14 NaN NaN NaN 0.267644 0.919876 0.631083
1950-01-15 NaN NaN NaN 0.981333 0.454678 0.533075
1950-01-16 NaN NaN NaN 0.831600 0.823845 0.980366
1950-01-17 NaN NaN NaN 0.303585 0.091634 0.338517
1950-01-18 NaN NaN NaN 0.723445 0.088020 0.570779
1950-01-19 NaN NaN NaN 0.639665 0.954577 0.763810
1950-01-20 NaN NaN NaN 0.370629 0.716066 0.628383

但是正如您所看到的,需要合并的列已被拆分,因为没有公共(public)行索引。我觉得这个功能对 pandas 来说是一个非常有用的补充。这可以吗?

最佳答案

另一种方法是使用 .combine 函数,它将结果的形状更改为两个轴上的并集。

combiner = lambda x, y: np.where(pd.isnull(x), y, x)
first.combine(second, combiner)

A B C D
1950-01-01 0.7917 0.5289 0.5680 NaN
1950-01-02 0.9256 0.0710 0.0871 NaN
1950-01-03 0.0202 0.8326 0.7782 NaN
1950-01-04 0.8700 0.9786 0.7992 NaN
1950-01-05 0.4615 0.7805 0.1183 NaN
1950-01-06 0.6399 0.1434 0.9447 NaN
1950-01-07 0.5218 0.4147 0.2646 NaN
1950-01-08 0.7742 0.4562 0.5684 NaN
1950-01-09 0.0188 0.6176 0.6121 NaN
1950-01-10 0.6169 0.9437 0.6818 NaN
1950-01-11 0.3595 0.4370 NaN 0.6976
1950-01-12 0.0602 0.6668 NaN 0.6706
1950-01-13 0.2104 0.1289 NaN 0.3154
1950-01-14 0.3637 0.5702 NaN 0.4386
1950-01-15 0.9884 0.1020 NaN 0.2089
1950-01-16 0.1613 0.6531 NaN 0.2533
1950-01-17 0.4663 0.2444 NaN 0.1590
1950-01-18 0.1104 0.6563 NaN 0.1382
1950-01-19 0.1966 0.3687 NaN 0.8210
1950-01-20 0.0971 0.8379 NaN 0.0961

关于python - 基于列标签 DatetimeIndex 组合 DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31909771/

25 4 0