gpt4 book ai didi

python - 计算列中集合类型的值

转载 作者:太空宇宙 更新时间:2023-11-04 02:06:12 26 4
gpt4 key购买 nike

在图像中有一个像下面这样的数据框

df = pd.DataFrame({'bus':[{268},{23,200,268},{24},{24},{200,268}],
'problem':["Driver Issues" ,"Driver Issues" , "Timing Problem","Routing",
"Timing Problem"]})

公交车一栏是公交车号,问题一栏是对公交车的投诉。在公交车列中,任何一行都可以有一个或多个公交车号码。

我正在尝试计算每个公交车号码及其最常见的问题/问题/投诉。找到最常见的公交车号码及其最常见的投诉。

但由于设置类型,无法使用 Counter 函数。

输出可以是这样的:

df2 = pd.DataFrame({'busses':["268","24","200","23"],
'ComplainFrequency':["3" ,"2" , "2","1"]})

Bus no: 268
Coplains:
Driver Issues:2
Timing Problem:1
....

最佳答案

首先将集合展平到新的 DataFrame:

df1 = pd.DataFrame([(c, b) for a, b in zip(df['bus'], df['problem']) for c in a], 
columns=['bus','problem'])
print (df1)
bus problem
0 268 Driver Issues
1 200 Driver Issues
2 268 Driver Issues
3 23 Driver Issues
4 24 Timing Problem
5 24 Routing
6 200 Timing Problem
7 268 Timing Problem

如果存在带有, 的字符串值集,那么有必要进行双重展平:

df = pd.DataFrame({'bus':[{'268'},{'23,200,268'},{'24'},{'24'},{'200,268'}], 
'problem':["Driver Issues" ,"Driver Issues" , "Timing Problem",
"Routing","Timing Problem"]})

print (df)
bus problem
0 {268} Driver Issues
1 {23,200,268} Driver Issues
2 {24} Timing Problem
3 {24} Routing
4 {200,268} Timing Problem

df1 = pd.DataFrame([(d, b) for a, b in zip(df['bus'], df['problem'])
for c in a
for d in c.split(',')],
columns=['bus','problem'])

print (df1)
bus problem
0 268 Driver Issues
1 23 Driver Issues
2 200 Driver Issues
3 268 Driver Issues
4 24 Timing Problem
5 24 Routing
6 200 Timing Problem
7 268 Timing Problem

然后使用GroupBy.size :

df2 = df1.groupby('bus')['problem'].size().reset_index(name='ComplainFrequency')
print (df2)
bus ComplainFrequency
0 200 2
1 23 1
2 24 2
3 268 3

df3 = df1.groupby(['bus','problem']).size().reset_index(name='Coplains')
print (df3)
bus problem Coplains
0 200 Driver Issues 1
1 200 Timing Problem 1
2 23 Driver Issues 1
3 24 Routing 1
4 24 Timing Problem 1
5 268 Driver Issues 2
6 268 Timing Problem 1

关于python - 计算列中集合类型的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54666209/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com