gpt4 book ai didi

python - Pandas 数据框 : Expand rows with lists to multiple row with desired indexing for all columns

转载 作者:太空狗 更新时间:2023-10-30 02:58:30 24 4
gpt4 key购买 nike

我在 pandas 数据框中有时间序列数据,索引为测量开始时的时间,列中有以固定采样率记录的值列表(连续索引的差异/列表中元素的数量)

这是它的样子:

Time         A                   B                   .......  Z
0 [1, 2, 3, 4] [1, 2, 3, 4]
2 [5, 6, 7, 8] [5, 6, 7, 8]
4 [9, 10, 11, 12] [9, 10, 11, 12]
6 [13, 14, 15, 16] [13, 14, 15, 16 ]
...

我想将所有列中的每一行扩展为多行,这样:

Time       A           B  .... Z
0 1 1
0.5 2 2
1 3 3
1.5 4 4
2 5 5
2.5 6 6
.......

到目前为止,我正在沿着这些思路思考(代码不工作):

def expand_row(dstruc):
for i in range (len(dstruc)):
for j in range (1,len(dstruc[i])):
dstruc.loc[i+j/len(dstruc[i])] = dstruc[i][j]

dstruc.loc[i] = dstruc[i][0]
return dstruc

expanded = testdf.apply(expand_row)

我也尝试过同时使用 split(',') 和 stack() 但我无法正确修复我的索引。

最佳答案

import numpy as np
import pandas as pd
df = pd.DataFrame({key: zip(*[iter(range(1, 17))]*4) for key in list('ABC')},
index=range(0,8,2))

result = pd.DataFrame.from_items([(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)], orient='index', columns=df.columns)
result.index.name = 'Time'

grouped = result.groupby(level=0)
increment = (grouped.cumcount()/grouped.size())
result.index = result.index + increment
print(result)

产量

In [183]: result
Out[183]:
A B C
Time
0.00 1 1 1
0.25 2 2 2
0.50 3 3 3
0.75 4 4 4
2.00 5 5 5
2.25 6 6 6
2.50 7 7 7
2.75 8 8 8
4.00 9 9 9
4.25 10 10 10
4.50 11 11 11
4.75 12 12 12
6.00 13 13 13
6.25 14 14 14
6.50 15 15 15
6.75 16 16 16

解释:

遍历列表内容的一种方法是使用列表理解:

In [172]: df = pd.DataFrame({key: zip(*[iter(range(1, 17))]*4) for key in list('ABC')}, index=range(2,10,2))

In [173]: [(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)]
Out[173]:
[(0, (1, 1, 1)),
(0, (2, 2, 2)),
...
(6, (15, 15, 15)),
(6, (16, 16, 16))]

一旦您拥有上述形式的值,您就可以使用 pd.DataFrame.from_items 构建所需的 DataFrame:

result = pd.DataFrame.from_items([(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)], orient='index', columns=df.columns)
result.index.name = 'Time'

产量

In [175]: result
Out[175]:
A B C
Time
2 1 1 1
2 2 2 2
...
8 15 15 15
8 16 16 16

要计算要添加到索引中的增量,您可以按索引分组并找到每组的 cumcountsize 的比率:

In [176]: grouped = result.groupby(level=0)
In [177]: increment = (grouped.cumcount()/grouped.size())
In [179]: result.index = result.index + increment
In [199]: result.index
Out[199]:
Int64Index([ 0.0, 0.25, 0.5, 0.75, 2.0, 2.25, 2.5, 2.75, 4.0, 4.25, 4.5,
4.75, 6.0, 6.25, 6.5, 6.75],
dtype='float64', name=u'Time')

关于python - Pandas 数据框 : Expand rows with lists to multiple row with desired indexing for all columns,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33793622/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com