gpt4 book ai didi

Python:从长格式 Pandas 数据帧创建嵌套列表

转载 作者:太空宇宙 更新时间:2023-11-04 00:21:39 24 4
gpt4 key购买 nike

我有一个数据框,其中只有“peak_time”是一列:

stimulus position peak_time 
1 1 1.0
2 1.5
2 1 2.0
2 2.0
3 1 2.5

现在我试图压缩位置列并获取列表,因此它看起来像这样:

stimulus peak_time  
1 [1.0, 1.5]
2 [2.0, 2.0]
3 [2.5]

这可能非常简单,但我找不到任何使用 goole 的解决方案。如果有人已经打开了这个主题,我也会很感激相应的链接。感谢您的帮助!

创建数据框的代码:

import random, scipy
import pandas as pd
trial = [1,1,2,1,1,2,2,1,2]
stimulus = [1,1,1,2,2,2,2,3,3]
position = [1,2,1,1,2,1,2,1,1]
peak_time = random.sample(range(1000), 9)
df = pd.DataFrame({"trial": trial, "stimulus": stimulus, "position": position, "peak_time": peak_time})
median_ = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian)
median_ = pd.DataFrame(median_)
median_.columns = ['peak_time']
median_

编辑

由于我每 90 分钟只能发布一个问题,所以我想在这篇文章下面提出一个跟进问题。所以现在我有两个像这样的 Pandas 系列:

median_:
stimulus
1 [1.0, 1.5]
2 [2.0, 2.0]
3 [2.0]

quartile_:
stimulus
1 [[1.0, 70.0], [1.0, 183.25]]
2 [[1.0, 65.75], [2.0, 98.75]]
3 [[1.0, 51.25]]

我想从 quartile_ 中减去 median_ 得到

distance_: 
stimulus
1 [1-1, 70-1], [1.5-1, 183.25-1.5]
2 [2-1, 65.75-1], [2-2, 98.75-2]
3 [2-1, 51.25-2]

有没有简单的方法来做到这一点? abs(median_ - quartile_) 不起作用。

创建系列的代码:

import random, scipy
import pandas as pd
trial = [1,1,2,1,1,2,2,1,2]
stimulus = [1,1,1,2,2,2,2,3,3]
position = [1,2,1,1,2,1,2,1,1]
peak_time = random.sample(range(1000), 9)
df = pd.DataFrame({"trial": trial, "stimulus": stimulus, "position": position, "peak_time": peak_time})
median_ = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian).groupby(level=0).apply(list)
quartile_ = df.groupby(['stimulus', 'position']).apply(lambda x: scipy.nanpercentile(x, [25, 75])).groupby(level=0).apply(list)

解决方案

稍后应用groupby(level=0).apply(list),所以

median_   = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian)
quartile_ = df.groupby(['stimulus', 'position']).apply(lambda x: scipy.nanpercentile(x, [25, 75]))

然后我可以很容易地减去它们

distance_ = abs(median_ - quartile_)
distance_ = distance.groupby(level=0).apply(list)
distance_

stimulus
1 [1-1, 70-1], [1.5-1, 183.25-1.5]
2 [2-1, 65.75-1], [2-2, 98.75-2]
3 [2-1, 51.25-2]

最佳答案

它是MultiIndex Series,所以需要Series.groupbyapply 列表:

#added column peak_time
median_ = df.groupby(['stimulus', 'position'])['peak_time'].apply(scipy.nanmedian)
df = median_.groupby(level=0).apply(list).reset_index()
print (df)
stimulus peak_time
0 1 [1.0, 1.5]
1 2 [2.0, 2.0]
2 3 [2.5]

关于Python:从长格式 Pandas 数据帧创建嵌套列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48928882/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com