gpt4 book ai didi

Cost of len() function(Len()函数的开销)

转载 作者:bug小助手 更新时间:2023-10-24 23:23:07 34 4
gpt4 key购买 nike



What is the cost of len() function for Python built-ins? (list/tuple/string/dictionary)

Python内置的len()函数的成本是多少?(列表/元组/字符串/词典)


更多回答
优秀答案推荐

It's O(1) (constant time, not depending of actual length of the element - very fast) on every type you've mentioned, plus set and others such as array.array.

它是O(1)(恒定时间,不依赖于元素的实际长度-非常快),对于您提到的每种类型,加上set和其他类型,如array.array。



Calling len() on those data types is O(1) in CPython, the official and most common implementation of the Python language. Here's a link to a table that provides the algorithmic complexity of many different functions in CPython:

在CPython中,对这些数据类型调用len()是O(1),这是Python语言的官方实现,也是最常见的实现。下面是一个表的链接,该表提供了CPython中许多不同函数的算法复杂性:


TimeComplexity Python Wiki Page

TimeComplexity Python维基页面



All those objects keep track of their own length. The time to extract the length is small (O(1) in big-O notation) and mostly consists of [rough description, written in Python terms, not C terms]: look up "len" in a dictionary and dispatch it to the built_in len function which will look up the object's __len__ method and call that ... all it has to do is return self.length

所有这些物体都记录着它们自己的长度。提取长度的时间很短(在BIG-O表示法中为O(1)),并且主要由[Rough Description,用Python术语而不是C术语编写]组成:在字典中查找“len”并将其分派给内置的len函数,该函数将查找对象的__len__方法并调用该方法...它所要做的就是返回self.length



The below measurements provide evidence that len() is O(1) for oft-used data structures.

下面的测量结果证明,对于常用的数据结构,len()为O(1)。



A note regarding timeit: When the -s flag is used and two strings are passed to timeit the first string is executed only once and is not timed.

关于timeit的一点注意:当使用-s标志并向timeit传递两个字符串时,第一个字符串只执行一次,并且不计时。



List:



$ python -m timeit -s "l = range(10);" "len(l)"
10000000 loops, best of 3: 0.0677 usec per loop

$ python -m timeit -s "l = range(1000000);" "len(l)"
10000000 loops, best of 3: 0.0688 usec per loop


Tuple:



$ python -m timeit -s "t = (1,)*10;" "len(t)"
10000000 loops, best of 3: 0.0712 usec per loop

$ python -m timeit -s "t = (1,)*1000000;" "len(t)"
10000000 loops, best of 3: 0.0699 usec per loop


String:



$ python -m timeit -s "s = '1'*10;" "len(s)"
10000000 loops, best of 3: 0.0713 usec per loop

$ python -m timeit -s "s = '1'*1000000;" "len(s)"
10000000 loops, best of 3: 0.0686 usec per loop


Dictionary (dictionary-comprehension available in 2.7+):



$ python -mtimeit -s"d = {i:j for i,j in enumerate(range(10))};" "len(d)"
10000000 loops, best of 3: 0.0711 usec per loop

$ python -mtimeit -s"d = {i:j for i,j in enumerate(range(1000000))};" "len(d)"
10000000 loops, best of 3: 0.0727 usec per loop


Array:



$ python -mtimeit -s"import array;a=array.array('i',range(10));" "len(a)"
10000000 loops, best of 3: 0.0682 usec per loop

$ python -mtimeit -s"import array;a=array.array('i',range(1000000));" "len(a)"
10000000 loops, best of 3: 0.0753 usec per loop


Set (set-comprehension available in 2.7+):



$ python -mtimeit -s"s = {i for i in range(10)};" "len(s)"
10000000 loops, best of 3: 0.0754 usec per loop

$ python -mtimeit -s"s = {i for i in range(1000000)};" "len(s)"
10000000 loops, best of 3: 0.0713 usec per loop


Deque:



$ python -mtimeit -s"from collections import deque;d=deque(range(10));" "len(d)"
100000000 loops, best of 3: 0.0163 usec per loop

$ python -mtimeit -s"from collections import deque;d=deque(range(1000000));" "len(d)"
100000000 loops, best of 3: 0.0163 usec per loop


len is an O(1) because in your RAM, lists are stored as tables (series of contiguous addresses). To know when the table stops the computer needs two things : length and start point. That is why len() is a O(1), the computer stores the value, so it just needs to look it up.

LEN是一个O(1),因为在您的RAM中,列表存储为表(一系列连续的地址)。要知道表何时停止,计算机需要两件事:长度和起始点。这就是为什么len()是O(1),计算机存储这个值,所以它只需要查找它。



It is O(1) in CPython because length is derived from the size attribute on the Pyobject representing the list. See [1], [2] and [3] in that order:

在CPython中,它是O(1),因为长度是从表示列表的PyObject上的Size属性派生出来的。按该顺序参见[1]、[2]和[3]:


[1]:

[1]:


static PyObject *
listiter_len(_PyListIterObject *it, PyObject *Py_UNUSED(ignored))
{
Py_ssize_t len;
if (it->it_seq) {
len = PyList_GET_SIZE(it->it_seq) - it->it_index;
if (len >= 0)
return PyLong_FromSsize_t(len);
}
return PyLong_FromLong(0);
}

[2]:

[2]:


static inline Py_ssize_t PyList_GET_SIZE(PyObject *op) {
PyListObject *list = _PyList_CAST(op);
return Py_SIZE(list);
}

[3]

[3]


static inline Py_ssize_t Py_SIZE(PyObject *ob) {
assert(ob->ob_type != &PyLong_Type);
assert(ob->ob_type != &PyBool_Type);
PyVarObject *var_ob = _PyVarObject_CAST(ob);
return var_ob->ob_size;
}

[1] listiter_len

[1]listiter_len


[2] PyList_GET_SIZE

[2]PyList_Get_Size


[3] Py_SIZE

[3]Py_Size


更多回答

Thanks for the helpful answer! Are there any native types for which this is not the case?

谢谢你有用的回答!有没有本机类型不是这样的?

interesting that get length runtime is only mentioned for list here - wiki.python.org/moin/TimeComplexity [not mentioned for other types]

有趣的是,获取长度运行时只提到这里的列表-wiki.python.org/moin/TimeComplexity [没有提到其他类型]

But why is it O(1)?

但为什么是O(1)呢?

len() is a very frequent operation, and making it O(1) is extremely easy from the viewpoint of implementation -- Python just keeps each collection's "number of items" (length) stored and updated as part of the collection data structure.

Len()是一种非常频繁的操作,从实现的角度来看,将其变为O(1)非常容易--作为集合数据结构的一部分,Python只是存储和更新每个集合的“项数”(长度)。

I assume its only O(1) because it was already calculated at time of creation and getting len(x) is just accessing that stored value

我假设它只有O(1),因为它在创建时就已经计算过了,而获取len(X)只是访问存储的值

why doesn't length show up in dictionary by dir(list) ?

为什么长度不按目录(列表)显示在词典中?

@ViFI Because it is just a example. The illustrated list.lenght variable is implemented in C, not Python.

@ViFI,因为这只是一个例子。所示的List.Lenght变量是用C实现的,而不是用Python实现的。

This is not so good of a benchmark even though it shows what we already know. This is because range(10) and range(1000000) is not supposed to be O(1).

这不是一个好的基准,尽管它显示了我们已经知道的东西。这是因为范围(10)和范围(1000000)不应该是O(1)。

This is by far the best answer. You should just add a conclusion just in case someone doesn't realize the constant time.

这是迄今为止最好的答案。你应该加上一个结论,以防有人没有意识到时间是恒定的。

Thanks for the comment. I added a note about the O(1) complexity of len(), and also fixed the measurements to properly use the -s flag.

谢谢你的评论。我添加了有关len()的O(1)复杂性的说明,并修复了测量结果,以便正确使用-S标志。

It is important to note that saving the length into a variable could save a significant amount of computational time: python -m timeit -s "l = range(10000);" "len(l); len(l); len(l)" 223 nsec per loop python -m timeit -s "l = range(100);" "len(l)" 66.2 nsec per loop

需要注意的是,将长度保存到变量中可以节省大量的计算时间:python-m timeit-S“L=Range(10000);”“len(L);len(L);len(L)”每个循环223nsec python-m timeit-S“L=range(100);”“len(L)”66.2nsec

I don't think this is true for python lists. They're linked lists, not arrays, so contiguous addresses are not guaranteed

我不认为这对Python列表是正确的。它们是链表,而不是数组,因此不能保证连续的地址

@bluppfisk You are totally wrong. Here are the python docs docs.python.org/3/faq/…

@bluppfak你完全错了。以下是Python文档docs.python.org/3/faq/…

34 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com