gpt4 book ai didi

python - 计算子组中缺失的实例

转载 作者:行者123 更新时间:2023-11-28 20:14:41 26 4
gpt4 key购买 nike

我在 Pandas 中有一个包含收集数据的数据框;

import pandas as pd
df = pd.DataFrame({'Group': ['A','A','A','A','A','A','A','B','B','B','B','B','B','B'], 'Subgroup': ['Blue', 'Blue','Blue','Red','Red','Red','Red','Blue','Blue','Blue','Blue','Red','Red','Red'],'Obs':[1,2,4,1,2,3,4,1,2,3,6,1,2,3]})

+-------+----------+-----+
| Group | Subgroup | Obs |
+-------+----------+-----+
| A | Blue | 1 |
| A | Blue | 2 |
| A | Blue | 4 |
| A | Red | 1 |
| A | Red | 2 |
| A | Red | 3 |
| A | Red | 4 |
| B | Blue | 1 |
| B | Blue | 2 |
| B | Blue | 3 |
| B | Blue | 6 |
| B | Red | 1 |
| B | Red | 2 |
| B | Red | 3 |
+-------+----------+-----+

观察值 ('Obs') 应该没有间隙地编号,但您可以看到我们“错过”了 A 组中的蓝色 3 和 B 组中的蓝色 4 和 5。期望的结果是所有 '每组错过'观察('Obs'),所以在这个例子中:

+-------+--------------------+--------+--------+
| Group | Total Observations | Missed | % |
+-------+--------------------+--------+--------+
| A | 8 | 1 | 12.5% |
| B | 9 | 2 | 22.22% |
+-------+--------------------+--------+--------+

我尝试使用 for 循环和使用组(例如:

df.groupby(['Group','Subgroup']).sum()
print(groups.head)

) 但我似乎无法以我尝试的任何方式让它发挥作用。我是不是用错了方法?

来自 another answer (对@Lie Ryan 大声喊叫)我找到了一个查找缺失元素的函数,但是我不太明白如何实现它;

def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result

def missing_elements(L):
missing = chain.from_iterable(range(x + 1, y) for x, y in window(L) if (y - x) > 1)
return list(missing)

谁能告诉我方向是正确的吗?

最佳答案

很简单,您需要在此处使用 groupby:

  1. 使用 groupby + diff,计算出每个 GroupSubGroup 缺少多少观察值
  2. Group df on Group,计算上一步计算的列的sizesum
  3. 几个更简单的步骤(计算百分比)即可为您提供预期的输出。

f = [   # declare an aggfunc list in advance, we'll need it later
('Total Observations', 'size'),
('Missed', 'sum')
]

g = df.groupby(['Group', 'Subgroup'])\
.Obs.diff()\
.sub(1)\
.groupby(df.Group)\
.agg(f)

g['Total Observations'] += g['Missed']
g['%'] = g['Missed'] / g['Total Observations'] * 100

g

Total Observations Missed %
Group
A 8.0 1.0 12.500000
B 9.0 2.0 22.222222

关于python - 计算子组中缺失的实例,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48876481/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com