gpt4 book ai didi

python - 使用 Pandas 计算列中的流行值

转载 作者:太空宇宙 更新时间:2023-11-04 00:51:29 27 4
gpt4 key购买 nike

我有一个 csv

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S

我需要统计最受欢迎的男性和女性名字。我可以这样做

for names in data['Name']:
name = names.split(', ')
print name[0]

但是有没有办法只使用 pandas 来做到这一点?

最佳答案

我想你可以先用 split 解析名称到新的 Series ser 然后是 groupby按列 SexSercountnlargest :

print data
Name Sex Age SibSp \
0 Braund, Mr. Owen Harris male 22.0 1
1 Futrelle, Mrs. John Bradley (Florence Briggs T... female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0
5 Moran, Mr. James male NaN 0
6 McCarthy, Mr. Timothy J male 54.0 0
7 Braund, Master. Gosta Leonard male 2.0 3

Parch Ticket Fare Cabin Embarked
0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
5 0 330877 8.4583 NaN Q
6 0 17463 51.8625 E46 S
7 1 349909 21.0750 NaN S
ser = data['Name'].str.split(',').str[0]
print ser
0 Braund
1 Futrelle
2 Heikkinen
3 Futrelle
4 Allen
5 Moran
6 McCarthy
7 Braund
Name: Name, dtype: object

print ser.groupby([data['Sex'], ser]).count()
Sex Name
female Futrelle 2
Heikkinen 1
male Allen 1
Braund 2
McCarthy 1
Moran 1
dtype: int64

print ser.groupby([data['Sex'], ser]).count().nlargest(4)
Sex Name
female Futrelle 2
male Braund 2
female Heikkinen 1
male Allen 1
dtype: int64

这与使用辅助列 all_names 相同:

data['all_names'] =  data['Name'].str.split(',').str[0]
print data

Name Sex Age SibSp \
0 Braund, Mr. Owen Harris male 22.0 1
1 Futrelle, Mrs. John Bradley (Florence Briggs T... female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0
5 Moran, Mr. James male NaN 0
6 McCarthy, Mr. Timothy J male 54.0 0
7 Braund, Master. Gosta Leonard male 2.0 3

Parch Ticket Fare Cabin Embarked all_names
0 0 A/5 21171 7.2500 NaN S Braund
1 0 PC 17599 71.2833 C85 C Futrelle
2 0 STON/O2. 3101282 7.9250 NaN S Heikkinen
3 0 113803 53.1000 C123 S Futrelle
4 0 373450 8.0500 NaN S Allen
5 0 330877 8.4583 NaN Q Moran
6 0 17463 51.8625 E46 S McCarthy
7 1 349909 21.0750 NaN S Braund
print data.groupby(['Sex', 'all_names'])['all_names'].count()
Sex all_names
female Futrelle 2
Heikkinen 1
male Allen 1
Braund 2
McCarthy 1
Moran 1
Name: all_names, dtype: int64

print data.groupby(['Sex', 'all_names'])['all_names'].count().nlargest(4)
Sex all_names
female Futrelle 2
male Braund 2
female Heikkinen 1
male Allen 1
Name: all_names, dtype: int64

关于python - 使用 Pandas 计算列中的流行值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36936964/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com