gpt4 book ai didi

python - 将 pandas DataFrame 列扩展为多行

转载 作者:太空狗 更新时间:2023-10-29 17:02:43 28 4
gpt4 key购买 nike

如果我有一个 DataFrame 这样:

pd.DataFrame( {"name" : "John", 
"days" : [[1, 3, 5, 7]]
})

给出这个结构:

           days  name
0 [1, 3, 5, 7] John

如何展开成下面的?

   days  name
0 1 John
1 3 John
2 5 John
3 7 John

最佳答案

您可以使用 df.itertuples 遍历每一行,并使用列表理解将数据 reshape 为所需的形式:

import pandas as pd

df = pd.DataFrame( {"name" : ["John", "Eric"],
"days" : [[1, 3, 5, 7], [2,4]]})
result = pd.DataFrame([(d, tup.name) for tup in df.itertuples() for d in tup.days])
print(result)

产量

   0     1
0 1 John
1 3 John
2 5 John
3 7 John
4 2 Eric
5 4 Eric

Divakar's solution , using_repeat, 是最快的:

In [48]: %timeit using_repeat(df)
1000 loops, best of 3: 834 µs per loop

In [5]: %timeit using_itertuples(df)
100 loops, best of 3: 3.43 ms per loop

In [7]: %timeit using_apply(df)
1 loop, best of 3: 379 ms per loop

In [8]: %timeit using_append(df)
1 loop, best of 3: 3.59 s per loop

这是用于上述基准测试的设置:

import numpy as np
import pandas as pd

N = 10**3
df = pd.DataFrame( {"name" : np.random.choice(list('ABCD'), size=N),
"days" : [np.random.randint(10, size=np.random.randint(5))
for i in range(N)]})

def using_itertuples(df):
return pd.DataFrame([(d, tup.name) for tup in df.itertuples() for d in tup.days])

def using_repeat(df):
lens = [len(item) for item in df['days']]
return pd.DataFrame( {"name" : np.repeat(df['name'].values,lens),
"days" : np.concatenate(df['days'].values)})

def using_apply(df):
return (df.apply(lambda x: pd.Series(x.days), axis=1)
.stack()
.reset_index(level=1, drop=1)
.to_frame('day')
.join(df['name']))

def using_append(df):
df2 = pd.DataFrame(columns = df.columns)
for i,r in df.iterrows():
for e in r.days:
new_r = r.copy()
new_r.days = e
df2 = df2.append(new_r)
return df2

关于python - 将 pandas DataFrame 列扩展为多行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38203352/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com