gpt4 book ai didi

python - 将数据框列字符串值转换为虚拟变量列

转载 作者:行者123 更新时间:2023-12-04 14:59:26 25 4
gpt4 key购买 nike

我有以下数据框(不包括其余列):

| customer_id | department                    |
| ----------- | ----------------------------- |
| 11 | ['nail', 'men_skincare'] |
| 23 | ['nail', 'fragrance'] |
| 25 | [] |
| 45 | ['skincare', 'men_fragrance'] |

我正在对我的数据进行预处理以适应模型。我想将部门变量转换为每个唯一部门类别的虚拟变量(对于可能存在的许多唯一部门,不仅限于此处的内容)。

想要得到这样的结果:

| customer_id | department                    | nail | men_skincare | fragrance | skincare | men_fragrance |
| ----------- | ---------- | ---- | ------------ | --------- | -------- | ------------- |
| 11 | ['nail', 'men_skincare'] | 1 | 1 | 0 | 0 | 0 |
| 23 | ['nail', 'fragrance'] | 1 | 0 | 1 | 0 | 0 |
| 25 | [] | 0 | 0 | 0 | 0 | 0 |
| 45 | ['skincare', 'men_fragrance'] | 0 | 0 | 0 | 1 | 1 |

我试过这个link ,但是当我拼接它时,它把它当作一个字符串,并且只为字符串中的每个字符创建一个列;我用了什么:

df['1st'] = df['department'].str[0]
df['2nd'] = df['department'].str[1]
df['3rd'] = df['department'].str[2]
df['4th'] = df['department'].str[3]
df['5th'] = df['department'].str[4]
df['6th'] = df['department'].str[5]
df['7th'] = df['department'].str[6]
df['8th'] = df['department'].str[7]
df['9th'] = df['department'].str[8]
df['10th'] = df['department'].str[9]

然后我尝试拆分字符串并使用以下方法变成列表:

df['new_column'] = df['department'].apply(lambda x: x.split(","))

然后再次尝试,仍然只为每个字符创建列。

有什么建议吗?

编辑:我使用 anky 发送的链接找到了答案,特别是我使用了这个:https://stackoverflow.com/a/29036042

对我有用的:

df['department'] = df['department'].str.replace("'",'').str.replace("]",'').str.replace("[",'').str.replace(' ','')
df['department'] = df['department'].apply(lambda x: x.split(","))
s = df['department']
df1 = pd.get_dummies(s.apply(pd.Series).stack()).sum(level=0)
df = pd.merge(df, df1, right_index=True, left_index=True, how = 'left')

最佳答案

import pandas as pd

您可以通过 explode()value_counts()fillna() 方法来做到这一点:

data=df.explode('department').fillna('empty')

现在使用crosstab()方法:

data=pd.crosstab(data['customer_id'],data['department'])

因为 concat() 方法给你一个错误,所以使用 merge() 方法和 drop() 方法:

data=pd.merge(df.set_index('customer_id'),data,left_index=True,right_index=True).drop(columns=['empty'])

现在,如果您打印data,您将获得所需的输出:

enter image description here

关于python - 将数据框列字符串值转换为虚拟变量列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67248922/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com