gpt4 book ai didi

python - 将字符串中的转换值附加到数据框中的新列

转载 作者:行者123 更新时间:2023-12-01 01:07:48 24 4
gpt4 key购买 nike

我有点进退两难

我有一个数据框:

旧_DF

    Date.               Year    On/Off      Gender. Status.    
0 2019-03-14 09:59:30 Senior Off Campus Male Full Time
1 2019-03-13 15:56:13 Senior Off Campus Male Full Time

第一个数据框有一列要求人们对某些事物进行排名,但是由于 Jotform 导出格式的无限智慧,它会获取他们的个人排名并将其放入每个单元格的一个字符串中,因此:

0   2019-03-14 09:59:30 Senior  Off Campus  Male    Full Time   1Food\r 2Lounge or Study Space\r 3Retail\r 4Ev...   NaN
1 2019-03-13 15:56:13 Senior Off Campus Male Full Time 1Lounge or Study Space\r 2Food\r 3Academic Res... NaN

我的想法本质上是将字符串拆分为关键词并为其分配字母值,即“Food”=“A”,“Lounge or Study Space”=“B”

从本质上讲,我想将字符串转换为“ABCDEFG”的任何可能的组合,并将其附加为仅包含字母组合的新列,然后计算出现次数最多的组合。

  'Combo'                 
0 'ABCDEFG'
1 'BDCFGAE'

我的问题是数学问题,有很多组合或者只有一个组合,

这就是我到目前为止所写的内容

clean_3 = 

rank
0 food lounge or study space retail event space ...
1 lounge or study space food academic resources ...

Combo_list = []
small_combo_list = []
for i in clean_3:

if clean_3[i] == 'food':
Combo_list.append('A')

elif clean_3[i] == 'lounge or study space':
Combo_list.append('B')

elif clean_3[i] == 'retail':
Combo_list.append('C')

elif clean_3[i] == 'event space':
Combo_list.append('D')

elif clean_3[i] == 'academic resources':
Combo_list.append('E')

elif clean_3[i] == 'student life':
NCombo_list.append('F')

elif clean_3[i] == 'general services':
Combo_list.append('G')

small_combo_list.append(Combo_list)

print(small_combo_list)

但是我收到此错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

这没有意义(至少对我来说),因为它是一个数据框而不是一个系列。

理想情况下,如果有更有效的方法可以做到这一点,请用它来敲我的头,因为此 csv 的大小尚未确定。如果我需要解释任何其他内容,请告诉我!

编辑:当前数据帧的唯一两行,证明了 jotforms 导出格式是多么笨拙

    Date.               Year    On/Off      Gender. Status.     Rank
0 2019-03-14 09:59:30 Senior Off Campus Male Full Time 1Food
2Lounge or Study Space
3Retail
4Event Space
5Academic Resources (Tutoring, Career Advice)
6Student Life (Student Involvement, Diversity Services)
7General Services (Lockers, Information Desk, Vending Machines)


Date. Year On/Off Gender. Status. Rank
1 2019-03-14 09:59:30 Senior Off Campus Male Full Time 1Food
2Lounge or Study Space
3Retail
4Event Space
5Academic Resources (Tutoring, Career Advice)
6Student Life (Student Involvement, Diversity Services)
7General Services (Lockers, Information Desk, Vending Machines)

最佳答案

如果我有更多示例数据会更好,仅用两行很难进行测试,但您可以尝试一下。

首先使用 .str.replace.str.split 清理数据。之后我将其转换为 object 类型。

现在我们已经清理并整理了所有选择。

所以我们可以简单地groupbycount如下:

# Dataframe I worked with
Date Year On/Off Gender Status \
0 2019-03-14 09:59:30 Senior Off Campus Male Full Time
1 2019-03-13 15:56:13 Senior Off Campus Male Full Time

Ranking
0 1Food\r 2Lounge or Study Space\r 3Retail\r 4Ev...
1 1Lounge or Study Space\r 2Food\r 3Academic Res...

# Clean up Ranking column
df['Ranking'] = df.Ranking.str.replace('\d+', '').str.split('\r').astype(str)

# Count the amount of choices and convert it to a column
df['times_chosen'] = df.groupby('Ranking').Ranking.transform('size')

输出

                                             Ranking  times_chosen
0 ['Food', ' Lounge or Study Space', ' Retail', ... 1
1 ['Lounge or Study Space', ' Food', ' Academic ... 1

第二个选项

不要转换为列,只需分组

df.groupby('Ranking').Ranking.size()

Ranking
['Food', ' Lounge or Study Space', ' Retail', ' Ev...'] 1
['Lounge or Study Space', ' Food', ' Academic Res...'] 1
Name: Ranking, dtype: int64

.agg

print(df.groupby('Ranking').agg({'Ranking': ['count']}))

Ranking
count
Ranking
['Food', ' Lounge or Study Space', ' Retail', '... 1
['Lounge or Study Space', ' Food', ' Academic R... 1

关于python - 将字符串中的转换值附加到数据框中的新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55168961/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com