gpt4 book ai didi

python - 为什么 "median"使用统计包比 "mean"快 2 倍?

转载 作者:太空狗 更新时间:2023-10-30 02:02:52 27 4
gpt4 key购买 nike

这让我感到惊讶...为了说明,我使用这个小代码来计算 1M 随机数的均值和中值:

import numpy as np
import statistics as st

import time

listofrandnum = np.random.rand(1000000,)

t = time.time()
print('mean is:', st.mean(listofrandnum))
print('time to calc mean:', time.time()-t)

print('\n')

t = time.time()
print('median is:', st.median(listofrandnum))
print('time to calc median:', time.time()-t)

结果是:

mean is: 0.499866595037
time to calc mean: 2.0767598152160645


median is: 0.499721597395
time to calc median: 0.9687695503234863

我的问题:均值为何比中位数慢?中位数需要一些排序算法(即比较),而均值需要求和。求和比比较慢有意义吗?

我将感谢您对此的洞察力。

最佳答案

statistics 不是 NumPy 的一部分。它是一个 Python 标准库模块,具有截然不同的设计理念;它不惜一切代价追求准确性,即使对于不寻常的输入数据类型和条件极差的输入也是如此。以 statistics 模块的方式执行求和真的比执行排序更昂贵。

如果您想要 NumPy 数组的有效均值或中值,请使用 NumPy 例程:

numpy.mean(whatever)
numpy.median(whatever)

如果您想查看statistics 模块进行简单求和的昂贵工作,您可以查看source code。 :

def _sum(data, start=0):
"""_sum(data [, start]) -> (type, sum, count)

Return a high-precision sum of the given numeric data as a fraction,
together with the type to be converted to and the count of items.

If optional argument ``start`` is given, it is added to the total.
If ``data`` is empty, ``start`` (defaulting to 0) is returned.


Examples
--------

>>> _sum([3, 2.25, 4.5, -0.5, 1.0], 0.75)
(<class 'float'>, Fraction(11, 1), 5)

Some sources of round-off error will be avoided:

>>> _sum([1e50, 1, -1e50] * 1000) # Built-in sum returns zero.
(<class 'float'>, Fraction(1000, 1), 3000)

Fractions and Decimals are also supported:

>>> from fractions import Fraction as F
>>> _sum([F(2, 3), F(7, 5), F(1, 4), F(5, 6)])
(<class 'fractions.Fraction'>, Fraction(63, 20), 4)

>>> from decimal import Decimal as D
>>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")]
>>> _sum(data)
(<class 'decimal.Decimal'>, Fraction(6963, 10000), 4)

Mixed types are currently treated as an error, except that int is
allowed.
"""
count = 0
n, d = _exact_ratio(start)
partials = {d: n}
partials_get = partials.get
T = _coerce(int, type(start))
for typ, values in groupby(data, type):
T = _coerce(T, typ) # or raise TypeError
for n,d in map(_exact_ratio, values):
count += 1
partials[d] = partials_get(d, 0) + n
if None in partials:
# The sum will be a NAN or INF. We can ignore all the finite
# partials, and just look at this special one.
total = partials[None]
assert not _isfinite(total)
else:
# Sum all the partial sums using builtin sum.
# FIXME is this faster if we sum them in order of the denominator?
total = sum(Fraction(n, d) for d, n in sorted(partials.items()))
return (T, total, count)

关于python - 为什么 "median"使用统计包比 "mean"快 2 倍?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38021111/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com