gpt4 book ai didi

Python Pandas 计算特定值的出现次数

转载 作者:太空宇宙 更新时间:2023-11-03 20:13:57 24 4
gpt4 key购买 nike

我正在尝试查找某个值在一列中出现的次数。

我用 data = pd.DataFrame.from_csv('data/DataSet2.csv') 制作了数据框

现在我想查找某些内容在列中出现的次数。这是如何完成的?

我以为是下面的内容,我在教育栏中查看并计算 ? 出现的次数。

下面的代码显示我正在尝试查找 9th 出现的次数,并且错误是我运行代码时遇到的错误

代码

missing2 = df.education.value_counts()['9th']
print(missing2)

错误

KeyError: '9th'

最佳答案

您可以根据您的条件创建数据子集,然后使用 shapelen:

print df
col1 education
0 a 9th
1 b 9th
2 c 8th

print df.education == '9th'
0 True
1 True
2 False
Name: education, dtype: bool

print df[df.education == '9th']
col1 education
0 a 9th
1 b 9th

print df[df.education == '9th'].shape[0]
2
print len(df[df['education'] == '9th'])
2

性能很有趣,最快的解决方案是比较 numpy 数组和 sum:

graph

代码:

import perfplot, string
np.random.seed(123)


def shape(df):
return df[df.education == 'a'].shape[0]

def len_df(df):
return len(df[df['education'] == 'a'])

def query_count(df):
return df.query('education == "a"').education.count()

def sum_mask(df):
return (df.education == 'a').sum()

def sum_mask_numpy(df):
return (df.education.values == 'a').sum()

def make_df(n):
L = list(string.ascii_letters)
df = pd.DataFrame(np.random.choice(L, size=n), columns=['education'])
return df

perfplot.show(
setup=make_df,
kernels=[shape, len_df, query_count, sum_mask, sum_mask_numpy],
n_range=[2**k for k in range(2, 25)],
logx=True,
logy=True,
equality_check=False,
xlabel='len(df)')

关于Python Pandas 计算特定值的出现次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58569013/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com