gpt4 book ai didi

python - 计算 Pandas 中特定列和每一行的非零百分比

转载 作者:太空宇宙 更新时间:2023-11-04 09:32:51 25 4
gpt4 key购买 nike

如果我有以下数据框:

   df = pd.DataFrame({'name':['john','mary','peter','jeff','bill','lisa','jose'], 'gender':['M','F','M','M','M','F','M'],'state':['california','dc','california','dc','california','texas','texas'],'num_children':[2,0,0,3,2,1,4],'num_pets':[5,1,0,5,2,2,3]})

name gender state num_children num_pets
0 john M california 2 5
1 mary F dc 0 1
2 peter M california 0 0
3 jeff M dc 3 5
4 bill M california 2 2
5 lisa F texas 1 2
6 jose M texas 4 3

我想创建一个新的行和列 pct. 以获取列 num_childrennum_pets 中零值的百分比预期输出:

    name gender       state      num_children  num_pets   pct.
0 pct. 28.6% 14.3%
1 john M california 2 5 0%
2 mary F dc 0 1 50%
3 peter M california 0 0 100%
4 jeff M dc 3 5 0%
5 bill M california 2 2 0%
6 lisa F texas 1 2 0%
7 jose M texas 4 3 0%

我计算了目标列每行中零的百分比:

df['pct'] = df[['num_children', 'num_pets']].astype(bool).sum(axis=1)/2
df['pct.'] = 1-df['pct']
del df['pct']
df['pct.'] = pd.Series(["{0:.0f}%".format(val * 100) for val in df['pct.']], index = df.index)

    name gender       state  num_children  num_pets  pct.
0 john M california 2 5 0%
1 mary F dc 0 1 50%
2 peter M california 0 0 100%
3 jeff M dc 3 5 0%
4 bill M california 2 2 0%
5 lisa F texas 1 2 0%
6 jose M texas 4 3 0%

但我不知道如何将下面的结果插入到 pct 的行中。正如预期的输出,请帮助我以更多 pythonic 方式获得预期结果。谢谢。

df[['num_children', 'num_pets']].astype(bool).sum(axis=0)/len(df.num_children)
Out[153]:
num_children 0.714286
num_pets 0.857143
dtype: float64

更新:同样的事情,但用于计算总和,非常感谢@jezrael:

df['sums'] = df[['num_children', 'num_pets']].sum(axis=1)
df1 = (df[['num_children', 'num_pets']].sum()
.to_frame()
.T
.assign(name='sums'))

df = pd.concat([df1.reindex(columns=df.columns, fill_value=''), df],
ignore_index=True, sort=False)
print (df)
name gender state num_children num_pets sums
0 sums 12 18
1 john M california 2 5 7
2 mary F dc 0 1 1
3 peter M california 0 0 0
4 jeff M dc 3 5 8
5 bill M california 2 2 4
6 lisa F texas 1 2 3
7 jose M texas 4 3 7

最佳答案

您可以通过将 0 值与 DataFrame.eq 进行比较,将 mean 与 bool 掩码一起使用,因为 sum/len=mean 根据定义,乘以 100 并使用 apply 添加百分比:

s = df[['num_children', 'num_pets']].eq(0).mean(axis=1)
df['pct'] = s.mul(100).apply("{0:.0f}%".format)

对于第一行,创建新的 DataFrame,其中的列与原始列和 concat 相同一起:

df1 = (df[['num_children', 'num_pets']].eq(0)
.mean()
.mul(100)
.apply("{0:.1f}%".format)
.to_frame()
.T
.assign(name='pct.'))

df = pd.concat([df1.reindex(columns=df.columns, fill_value=''), df],
ignore_index=True, sort=False)
print (df)
name gender state num_children num_pets pct
0 pct. 28.6% 14.3%
1 john M california 2 5 0%
2 mary F dc 0 1 50%
3 peter M california 0 0 100%
4 jeff M dc 3 5 0%
5 bill M california 2 2 0%
6 lisa F texas 1 2 0%
7 jose M texas 4 3 0%

关于python - 计算 Pandas 中特定列和每一行的非零百分比,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55021654/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com