gpt4 book ai didi

python - 将 pandas 数据框从按行扩展到按列

转载 作者:行者123 更新时间:2023-12-05 01:22:47 25 4
gpt4 key购买 nike

我想扩展以下(玩具示例)pandas DataFrame 的列,

df = pd.DataFrame({'col1': ["A", "A", "A", "B", "B", "B"],
'col2': [1, 7, 3, 2, 9, 4],
'col3': [3, -1, 0, 5, -2, -3],})


col1 col2 col3
0 A 1 3
1 A 7 -1
2 A 3 0
3 B 2 5
4 B 9 -2
5 B 4 -3

这样它就会变成按行的,

  col1  col2_1  col2_2  col2_3  col3_1  col3_2  col3_3
0 A 1 7 3 3 -1 0
1 B 2 9 4 5 -2 -3

我知道我将使用 groupby('col1') 但不知道如何实现所需的 DataFrame。注意:我们执行groupby('col1')时每组的元素个数都是相等的(本例中我们有3个A和3个B)

编辑:我设法通过以下代码做到了,但它效率不高

import pandas as pd
from functools import partial

def func(x, exclude_list):
for col in x.columns:
if col in exclude_list:
continue
for i, value in enumerate(x[col].values):
x[f'{col}_{i+1}'] = value
return x

df = pd.DataFrame({'col1': ["A", "A", "A", "B", "B", "B"],
'col2': [1, 7, 3, 2, 9, 4],
'col3': [3, -1, 0, 5, -2, -3],})

exclude_list = ['col1']
columns_to_expand = ['col2', 'col3']
func2 = partial(func, exclude_list=exclude_list)
df2 = df.groupby(exclude_list).apply(func2)
df2.drop(columns_to_expand, axis=1, inplace=True)
df3 = df2.groupby(exclude_list).tail(1).reset_index()
df3.drop('index', axis=1, inplace=True)
print(df3)

结果是,

  col1  col2_1  col2_2  col2_3  col3_1  col3_2  col3_3
0 A 1 7 3 3 -1 0
1 B 2 9 4 5 -2 -3

Edit2:此代码基于 ouroboros1 答案有效,

df_pivot = None
for col in columns_to_expand:
df['index'] = [f'{col}_{i}' for i in range(1,4)]*len(np.unique(df[exclude_list].values))
if df_pivot is None:
df_pivot = df.pivot(index=exclude_list, values=col, columns='index').reset_index(drop=False)
else:
df_pivot = df_pivot.merge(df.pivot(index=exclude_list, values=col, columns='index').reset_index(drop=False))

最佳答案

更新:问题已更新为按行扩展多个 列。这需要对针对初始问题定制的初始答案进行一些重构,这只需要在一个 列 (col2) 上进行操作。请注意,当前重构的答案在单个列上工作得很好。但是,由于在这种情况下它们有点冗长,所以我在最后仅保留 1 列的原始答案。


按行扩展多列的答案

你可以使用 df.pivot为此:

import pandas as pd

df = pd.DataFrame({'col1': ["A", "A", "A", "B", "B", "B"],
'col2': [1, 7, 3, 2, 9, 4],
'col3': [3, -1, 0, 5, -2, -3],})

cols = ['col2','col3']

# val count per unique val in col1: N.B. expecting all to have same count!
vals_unique_col1 = df.col1.value_counts()[0]+1 # 3+1 (use in `range()`)
len_unique_col1 = len(df.col1.unique()) # 2

# create temp cols [1,2,3] and store in new col
df['my_index'] = [i for i in range(1,vals_unique_col1)]*len_unique_col1
df_pivot = df.pivot(index='col1',values=cols,columns='my_index')\
.reset_index(drop=False)

# customize df cols by joining MultiIndex columns
df_pivot.columns = ['_'.join(str(i) for i in x) for x in df_pivot.columns]
df_pivot.rename(columns={'col1_':'col1'}, inplace=True)

print(df_pivot)

col1 col2_1 col2_2 col2_3 col3_1 col3_2 col3_3
0 A 1 7 3 3 -1 0
1 B 2 9 4 5 -2 -3

2 个基于 df.groupby 的替代解决方案可能是这样的:

  • Groupby解决方案1
import pandas as pd

df = pd.DataFrame({'col1': ["A", "A", "A", "B", "B", "B"],
'col2': [1, 7, 3, 2, 9, 4],
'col3': [3, -1, 0, 5, -2, -3],})

cols = ['col2','col3']

df_groupby = df.groupby('col1')[cols].agg(list)\
.apply(pd.Series.explode, axis=1).reset_index(drop=False)

# same as in `pivot` method, this will be 3
len_cols = df.col1.value_counts()[0]

# rename cols
df_groupby.columns=[f'{col}_{(idx-1)%len_cols+1}' if col != 'col1' else col
for idx, col in enumerate(df_groupby.columns)]
  • Groupby解决方案2
import pandas as pd
import numpy as np

df = pd.DataFrame({'col1': ["A", "A", "A", "B", "B", "B"],
'col2': [1, 7, 3, 2, 9, 4],
'col3': [3, -1, 0, 5, -2, -3],})

cols = ['col2','col3']

agg_lists = df.groupby('col1')[cols].agg(list)

dfs = [pd.DataFrame(agg_lists[col].tolist(), index=agg_lists.index)
for col in agg_lists.columns]

df_groupby = pd.concat(dfs, axis=1)

len_cols = df.col1.value_counts()[0]
cols_rep = np.repeat(cols,len_cols)

df_groupby.columns = [f'{col}_{str(i+1)}' for col, i
in zip(cols_rep, df_groupby.columns)]
df_groupby.reset_index(drop=False, inplace=True)

(原始)按行扩展单列的答案

你可以使用 df.pivot为此:

import pandas as pd

df = pd.DataFrame({'col1': ["A", "A", "A", "B", "B", "B"],
'col2': [1, 7, 3, 2, 9, 4]})

# add col with prospective col names (`col1_1,*_2,*_3`)
# and multiply by len unique values in `df.col1`
df['index'] = [f'col2_{i}' for i in range(1,4)]*len(df.col1.unique())

df_pivot = df.pivot(index='col1',values='col2',columns='index')\
.reset_index(drop=False)

print(df_pivot)

index col1 col2_1 col2_2 col2_3
0 A 1 7 3
1 B 2 9 4

基于 df.groupby 的替代解决方案可能是这样的:

import pandas as pd

df = pd.DataFrame({'col1': ["A", "A", "A", "B", "B", "B"], \
'col2': [1, 7, 3, 2, 9, 4]})


# create lists of values in `col2` per group in `col1`,
# then expand into multiple cols with `apply(pd.Series), finally reset index
df_groupby = df.groupby('col1').agg(list)['col2']\
.apply(pd.Series).reset_index(drop=False)

# overwrite new cols (`0,1,2`) with desired col names `col2_1, etc.`
df_groupby.columns=[f'col2_{col+1}' if col != 'col1' else col
for col in list(df_groupby.columns)]

print(df_groupby)

col1 col2_1 col2_2 col2_3
0 A 1 7 3
1 B 2 9 4

关于python - 将 pandas 数据框从按行扩展到按列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73433013/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com