gpt4 book ai didi

python - 如何在 Pandas 中创建与多列相结合的数据框列

转载 作者:太空宇宙 更新时间:2023-11-04 00:15:18 24 4
gpt4 key购买 nike

我有一些数据可以跟踪公司名称随时间的变化。但是,我不想让每个名称都在一行中更改,而是希望将它们全部连接在一个字段中。

可以使用以下方式构建输入数据:

#Import the modules:
import pandas as pd
import numpy as np

#Create the empty data frame:
df = pd.DataFrame(columns=['dt','old_name','new_name'])

#Populate the data frame:
df.loc[len(df)] = ['01/01/2001', 'AAA', 'BBB']
df.loc[len(df)] = ['02/02/2002', 'BBB', 'CCC']
df.loc[len(df)] = ['03/03/2003', 'CCC', 'DDD']

#View the output:
df

我希望输出的样子可以用这个来创建:

#Create the empty data frame:
end_df = pd.DataFrame(columns=['dt','name'])

#Populate:
end_df.loc[len(end_df)] = ['01/01/2001', 'AAA-BBB-CCC-DDD']
end_df.loc[len(end_df)] = ['02/02/2002', 'AAA-BBB-CCC-DDD']
end_df.loc[len(end_df)] = ['03/03/2003', 'AAA-BBB-CCC-DDD']

#View the output:
end_df

编辑: 我在 Pyspark2 中使用 pandas 数据框运行此代码 - 以防对语法造成任何影响。此外,我的数据集中有多组姓名。我的意思是,还有更多组名称更改与需要连接的第一组名称无关。

示例分组输入:

#Create the empty data frame:
df = pd.DataFrame(columns=['dt','old_name','new_name'])

#Populate the data frame:
df.loc[len(df)] = ['01/01/2001', 'AAA', 'BBB']
df.loc[len(df)] = ['02/02/2002', 'BBB', 'CCC']
df.loc[len(df)] = ['03/03/2003', 'CCC', 'DDD']
df.loc[len(df)] = ['02/01/2001', 'XXX', 'YYY']
df.loc[len(df)] = ['03/02/2002', 'YYY', 'ZZZ']

示例分组输出:

#Create the empty data frame:
end_df = pd.DataFrame(columns=['dt','name'])

#Populate:
end_df.loc[len(end_df)] = ['01/01/2001', 'AAA-BBB-CCC-DDD']
end_df.loc[len(end_df)] = ['02/02/2002', 'AAA-BBB-CCC-DDD']
end_df.loc[len(end_df)] = ['03/03/2003', 'AAA-BBB-CCC-DDD']
end_df.loc[len(end_df)] = ['02/01/2001', 'XXX-YYY-ZZZ']
end_df.loc[len(end_df)] = ['03/02/2002', 'XXX-YYY-ZZZ']

如果您需要任何进一步的说明,请告诉我。

最佳答案

你需要 np.flatten 和 np.unique

import numpy as np
end_df = pd.DataFrame(columns=['dt','name'])
end_df['dt']=df['dt'].copy()
flat=df[df.columns[1:]].values.flatten()
end_df['name']='-'.join(np.unique(flat))

print(end_df)
dt name
0 01/01/2001 AAA-BBB-CCC-DDD
1 02/02/2002 AAA-BBB-CCC-DDD
2 03/03/2003 AAA-BBB-CCC-DDD

关于python - 如何在 Pandas 中创建与多列相结合的数据框列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51204227/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com