gpt4 book ai didi

python - 使宽数据框变长并根据另一列的名称添加列

转载 作者:行者123 更新时间:2023-12-05 08:45:30 24 4
gpt4 key购买 nike

我需要使用一些列名称作为 df 的一部分。在保持前 3 列相同的同时,我需要根据行的内容创建一些其他列。

这里我有一些客户的一些交易:

cust_id    cust_first   cust_last   au_zo   au_zo_pay   fi_gu   fi_gu_pay   wa     wa_pay
0 1000 Andrew Jones 50.85 debit NaN NaN 69.12 debit
1 1001 Fatima Lee NaN NaN 18.16 debit NaN NaN
2 1002 Sophia Lewis NaN NaN NaN NaN 159.54. credit
3 1003 Edward Bush 45.29 credit 59.63 credit NaN NaN
4 1004 Mark Nunez 20.87 credit 20.87 credit 86.18 debit

首先,我需要添加一个新列“城市”。因为它不在数据库中。它默认为“纽约”。 (这很简单!)

但这里是我卡住的地方:添加一个新列“store”,根据交易发生的地点保存值。 au_zo --> autozone,fi_gu --> 五个人,wa --> walmart

根据之前添加的商店添加新列“分类”:汽车专区 --> 汽车维修,五人 --> 食品,沃尔玛 --> 杂货

“金额”列包含客户和商店的值(value)。

‘transaction_type’列分别是au_zo_pay、fi_gu_pay、wa_pay的值。

所以最后看起来像这样:

  cust_id  city cust_first  cust_last   store   classification  amount  trans_type
0 1000 New York  Andrew Jones auto zone auto-repair 50.85 debit
1 1000 New York Andrew Jones walmart groceries 69.12 debit
2 1001 New York Fatima Lee five guys food 18.16 debit
3 1002 New York Sophia Solis walmart groceries 159.54 credit
4 1003 New York Edward Bush auto zone auto-repair 45.29 credit
5 1003 New York Edward Bush five guys food 59.63 credit
6 1004 New York Mark Nunez auto zone auto-repair 20.87 credit
7 1004 New York Mark Nunez five guys food 20.87 credit
8 1004 New York Mark Nunez walmart groceries 86.18 debit

我已经尝试使用 df.melt() 但我没有得到结果。

最佳答案

这是你想要的东西吗?

import pandas as pd
mp = {
'au_zo': 'auto-repair',
'wa':'groceries',
'fi_gu':'food'
}

### Read txt Data: get pandas df
# I copied and pasted your sample data to a txt file, you can ignore this part
with open(r"C:\Users\orf-haoj\Desktop\test.txt", 'r') as file:
head, *df = [row.split() for row in file.readlines()]
df = [row[1:] for row in df]
df = pd.DataFrame(df, columns=head)

### Here we conduct 2 melts to form melt_1 & melt_2 data
# this melt table is to melt cols 'au_zo','fi_gu', and 'wa'. & get amount as value
melt_1 = df.melt(id_vars=['cust_id', 'cust_first', 'cust_last'], value_vars=['au_zo','fi_gu','wa'], var_name='store', value_name='amount')
# this melt table is to melt cols ['au_zo_pay','fi_gu_pay','wa_pay']. & get trans_type cols
melt_2 = df.melt(id_vars=['cust_id', 'cust_first', 'cust_last'], value_vars=['au_zo_pay', 'fi_gu_pay', 'wa_pay'], var_name='store pay', value_name='trans_type')
# since I want to join these table later, it will a good to get one more key store
melt_2['store'] = melt_2['store pay'].apply(lambda x: '_'.join(x.split("_")[:-1]))

### Remove NaN
# you prob want to switch to test = test.loc[~test['amount'].isnull()] or something else if you have actual nan
melt_1 = melt_1.loc[melt_1['amount'] != 'NaN']
melt_2 = melt_2.loc[melt_2['trans_type'] != 'NaN']

### Inner join data based on 4 keys (assuming your data will have one to one relationship based on these 4 keys)
full_df = melt_1.merge(melt_2, on=['cust_id', 'cust_first', 'cust_last', 'store'], how='inner')
full_df['city'] = 'New York'
full_df['classification'] = full_df['store'].apply(lambda x: mp[x])

此外,这种方法也有其局限性。例如,当基于这四个键的一对一关系不正确时,它会生成错误的数据集。

关于python - 使宽数据框变长并根据另一列的名称添加列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72524927/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com