Cost of len() function(Len()函数的开销)-6ren

Cost of len() function(Len()函数的开销)

转载作者：bug小助手更新时间：2023-10-24 23:23:07

What is the cost of len() function for Python built-ins? (list/tuple/string/dictionary)

Python内置的len()函数的成本是多少？(列表/元组/字符串/词典)

更多回答

优秀答案推荐

It's O(1) (constant time, not depending of actual length of the element - very fast) on every type you've mentioned, plus set and others such as array.array.

它是O(1)(恒定时间，不依赖于元素的实际长度-非常快)，对于您提到的每种类型，加上set和其他类型，如array.array。

Calling len() on those data types is O(1) in CPython, the official and most common implementation of the Python language. Here's a link to a table that provides the algorithmic complexity of many different functions in CPython:

在CPython中，对这些数据类型调用len()是O(1)，这是Python语言的官方实现，也是最常见的实现。下面是一个表的链接，该表提供了CPython中许多不同函数的算法复杂性：

TimeComplexity Python Wiki Page

TimeComplexity Python维基页面

All those objects keep track of their own length. The time to extract the length is small (O(1) in big-O notation) and mostly consists of [rough description, written in Python terms, not C terms]: look up "len" in a dictionary and dispatch it to the built_in len function which will look up the object's __len__ method and call that ... all it has to do is return self.length

所有这些物体都记录着它们自己的长度。提取长度的时间很短(在BIG-O表示法中为O(1))，并且主要由[Rough Description，用Python术语而不是C术语编写]组成：在字典中查找“len”并将其分派给内置的len函数，该函数将查找对象的__len__方法并调用该方法...它所要做的就是返回self.length

The below measurements provide evidence that len() is O(1) for oft-used data structures.

下面的测量结果证明，对于常用的数据结构，len()为O(1)。

A note regarding timeit: When the -s flag is used and two strings are passed to timeit the first string is executed only once and is not timed.

关于timeit的一点注意：当使用-s标志并向timeit传递两个字符串时，第一个字符串只执行一次，并且不计时。

List:

$ python -m timeit -s "l = range(10);" "len(l)"
10000000 loops, best of 3: 0.0677 usec per loop

$ python -m timeit -s "l = range(1000000);" "len(l)"
10000000 loops, best of 3: 0.0688 usec per loop

Tuple:

$ python -m timeit -s "t = (1,)*10;" "len(t)"
10000000 loops, best of 3: 0.0712 usec per loop

$ python -m timeit -s "t = (1,)*1000000;" "len(t)"
10000000 loops, best of 3: 0.0699 usec per loop

String:

$ python -m timeit -s "s = '1'*10;" "len(s)"
10000000 loops, best of 3: 0.0713 usec per loop

$ python -m timeit -s "s = '1'*1000000;" "len(s)"
10000000 loops, best of 3: 0.0686 usec per loop

Dictionary (dictionary-comprehension available in 2.7+):

$ python -mtimeit -s"d = {i:j for i,j in enumerate(range(10))};" "len(d)"
10000000 loops, best of 3: 0.0711 usec per loop

$ python -mtimeit -s"d = {i:j for i,j in enumerate(range(1000000))};" "len(d)"
10000000 loops, best of 3: 0.0727 usec per loop

Array:

$ python -mtimeit -s"import array;a=array.array('i',range(10));" "len(a)"
10000000 loops, best of 3: 0.0682 usec per loop

$ python -mtimeit -s"import array;a=array.array('i',range(1000000));" "len(a)"
10000000 loops, best of 3: 0.0753 usec per loop

Set (set-comprehension available in 2.7+):

$ python -mtimeit -s"s = {i for i in range(10)};" "len(s)"
10000000 loops, best of 3: 0.0754 usec per loop

$ python -mtimeit -s"s = {i for i in range(1000000)};" "len(s)"
10000000 loops, best of 3: 0.0713 usec per loop

Deque:

$ python -mtimeit -s"from collections import deque;d=deque(range(10));" "len(d)"
100000000 loops, best of 3: 0.0163 usec per loop

$ python -mtimeit -s"from collections import deque;d=deque(range(1000000));" "len(d)"
100000000 loops, best of 3: 0.0163 usec per loop

len is an O(1) because in your RAM, lists are stored as tables (series of contiguous addresses). To know when the table stops the computer needs two things : length and start point. That is why len() is a O(1), the computer stores the value, so it just needs to look it up.

LEN是一个O(1)，因为在您的RAM中，列表存储为表(一系列连续的地址)。要知道表何时停止，计算机需要两件事：长度和起始点。这就是为什么len()是O(1)，计算机存储这个值，所以它只需要查找它。

It is O(1) in CPython because length is derived from the size attribute on the Pyobject representing the list. See [1], [2] and [3] in that order:

在CPython中，它是O(1)，因为长度是从表示列表的PyObject上的Size属性派生出来的。按该顺序参见[1]、[2]和[3]：

[1]:

[1]：

static PyObject *
listiter_len(_PyListIterObject *it, PyObject *Py_UNUSED(ignored))
{
    Py_ssize_t len;
    if (it->it_seq) {
        len = PyList_GET_SIZE(it->it_seq) - it->it_index;
        if (len >= 0)
            return PyLong_FromSsize_t(len);
    }
    return PyLong_FromLong(0);
}

[2]:

[2]：

static inline Py_ssize_t PyList_GET_SIZE(PyObject *op) {
    PyListObject *list = _PyList_CAST(op);
    return Py_SIZE(list);
}

[3]

[3]

static inline Py_ssize_t Py_SIZE(PyObject *ob) {
    assert(ob->ob_type != &PyLong_Type);
    assert(ob->ob_type != &PyBool_Type);
    PyVarObject *var_ob = _PyVarObject_CAST(ob);
    return var_ob->ob_size;
}

[1] listiter_len

[1]listiter_len

[2] PyList_GET_SIZE

[2]PyList_Get_Size

[3] Py_SIZE

[3]Py_Size

更多回答

Thanks for the helpful answer! Are there any native types for which this is not the case?

谢谢你有用的回答！有没有本机类型不是这样的？

interesting that get length runtime is only mentioned for list here - wiki.python.org/moin/TimeComplexity [not mentioned for other types]

有趣的是，获取长度运行时只提到这里的列表-wiki.python.org/moin/TimeComplexity [没有提到其他类型]

But why is it O(1)?

但为什么是O(1)呢？

len() is a very frequent operation, and making it O(1) is extremely easy from the viewpoint of implementation -- Python just keeps each collection's "number of items" (length) stored and updated as part of the collection data structure.

Len()是一种非常频繁的操作，从实现的角度来看，将其变为O(1)非常容易--作为集合数据结构的一部分，Python只是存储和更新每个集合的“项数”(长度)。

I assume its only O(1) because it was already calculated at time of creation and getting len(x) is just accessing that stored value

我假设它只有O(1)，因为它在创建时就已经计算过了，而获取len(X)只是访问存储的值

why doesn't length show up in dictionary by dir(list) ?

为什么长度不按目录(列表)显示在词典中？

@ViFI Because it is just a example. The illustrated list.lenght variable is implemented in C, not Python.

@ViFI，因为这只是一个例子。所示的List.Lenght变量是用C实现的，而不是用Python实现的。

This is not so good of a benchmark even though it shows what we already know. This is because range(10) and range(1000000) is not supposed to be O(1).

这不是一个好的基准，尽管它显示了我们已经知道的东西。这是因为范围(10)和范围(1000000)不应该是O(1)。

This is by far the best answer. You should just add a conclusion just in case someone doesn't realize the constant time.

这是迄今为止最好的答案。你应该加上一个结论，以防有人没有意识到时间是恒定的。

Thanks for the comment. I added a note about the O(1) complexity of len(), and also fixed the measurements to properly use the -s flag.

谢谢你的评论。我添加了有关len()的O(1)复杂性的说明，并修复了测量结果，以便正确使用-S标志。

It is important to note that saving the length into a variable could save a significant amount of computational time: python -m timeit -s "l = range(10000);" "len(l); len(l); len(l)" 223 nsec per loop python -m timeit -s "l = range(100);" "len(l)" 66.2 nsec per loop

需要注意的是，将长度保存到变量中可以节省大量的计算时间：python-m timeit-S“L=Range(10000)；”“len(L)；len(L)；len(L)”每个循环223nsec python-m timeit-S“L=range(100)；”“len(L)”66.2nsec

I don't think this is true for python lists. They're linked lists, not arrays, so contiguous addresses are not guaranteed

我不认为这对Python列表是正确的。它们是链表，而不是数组，因此不能保证连续的地址

@bluppfisk You are totally wrong. Here are the python docs docs.python.org/3/faq/…

@bluppfak你完全错了。以下是Python文档docs.python.org/3/faq/…

WebRTC 开销
我想知道，通过数据 channel 发送数据时 WebRTC 会产生多少开销。我知道 Websockets 每帧有 2 - 14 字节的开销。 WebRTC 是否使用更多开销？我在网上找不到一些有用
与类和对象相关的 JavaScript 开销
我想知道与创建新类而不是该类的新对象相关的开销是小还是大。我正在使用 dojo，但我将提供纯 JS 的示例。我将在启动时创建 10 到 100 个对象，我认为这不会是一个严重的问题，但我想涵盖所有基础
MySQL 开销，是我的查询错误还是应该优化表？
我有一个如下所示的表设置。 Table comment_flags user_id comment_id 我允许用户标记评论，然后给他们取消标记的选项，因为他们可能犯了一个错误。问题
Mysql phpmyadmin 开销
这个问题已经有答案了: 已关闭10 年前。 Possible Duplicate: In MySQL what does “Overhead” mean, what is bad about it,
重复分配的 JavaScript 开销
我正在制作一个非常简单的游戏，只是为了好玩/练习，但无论它现在有多简单，我仍然想很好地编写它，以防我想回到它并只是为了学习因此，在这种情况下，我的问题是: 对象分配涉及多少开销？解释器对此的优化程度
c# - 传递结构是否比传递其成员占用更多的内存/开销？
我有一些资源敏感的东西要写。我想知道与仅将这些变量一起传递(例如作为函数参数)相比，在结构中将变量组合在一起是否真的会导致内存开销。如果是这样，那么在不产生开销的情况下创建对惰性值进行操作的东西的好
Python:OOP 开销？
我一直在开发一个实时应用程序，并注意到一些 OOP 设计模式在 Python 中引入了难以置信的开销(使用 2.7.5 进行了测试)。直截了当，当字典被另一个对象封装时，为什么简单的字典值访问器方法
字符串连接的 C++ 开销
我正在从 ifstream 中读取随机 ascii 文本文件。我需要能够将整个消息放入字符串类型以进行字符解析。我当前的解决方案有效，但我认为我通过使用等效于此的方式来谋杀更冗长文件的处理时间: st
android - getActivity() 开销
纯粹从软件工程的角度来看，getActivity() 有多少开销？我在整个应用程序中经常多次使用此方法，并考虑使用一个引用 getActivity() 的全局变量。如果为 Activity 设置一
recursion - F# 递归与迭代速度/开销
我一直在研究 Riccardo Terrell 的 Akka.NET 分形演示 (https://github.com/rikace/akkafractal) 以尝试理解它。 (这很棒，顺便说一句)
performance - 高分辨率计时器/代码运行时间-> 开销？
我正在尝试使用高分辨率计时器查找我的代码运行时间，我注意到计时器的结果不一致，我想知道为什么会这样。我找到了这篇文章 How do you test running time of VBA code
WPF 绑定(bind)开销
我正在学习WPF。我现在开始装订了。使用 INotifyPropertyChanged 时绑定(bind)是否依赖反射？是这样，价格是多少？我正在考虑使用 WPF 来显示通过 UDP 流式传输的数据，
C++ 静态成员函数与 lambda 开销
我有某种模板化基类 template class Base { }; 并希望将其派生实例存储在列表中。为此，我使用 using derived_handle = std::unique_ptr v
haskell - GHC TypeLit 开销
使用GHC.TypeLits中的Sing有任何开销吗？？以程序为例: {-# LANGUAGE DataKinds #-} module Test (test) where import GHC.T
C++ 静态成员函数与 lambda 开销
我有某种模板化基类 template class Base { }; 并希望将其派生实例存储在列表中。为此，我使用 using derived_handle = std::unique_ptr v
python - 如何跳过结果中的 ORM 开销？
我有一个 ORM sqlalchemy 模型，我需要构建一个查询(使用 ORM 类更容易构建)，但这需要大量时间。当我直接像 SQL 一样向数据库执行相同的查询时，速度相当快。使用 SQLAlche
php - MySQL 开销 - 如何调整服务器以加速不良查询
我在 PHP 平台上有一家商店(开发不善)，那里有很多不好的查询(没有索引的长查询、rand() 排序、动态计数，..) 我现在无法更改查询，但我必须调整服务器才能保持事件状态。我尝试了我所知道的一
php - 本地服务器上的 MySQL 开销
我有一个使用 JQuery mobile 构建的移动应用程序，响应时间对我来说非常重要，因为我希望为我的用户提供流畅的体验。我刚刚将网站的安装移至本地服务器，以提高应用程序的性能，因为它连接到本地
mysql - 列的 SQL 开销
关于数据库设计的问题。如果我有 28 个 bool 值并且能够将它们添加为每行 28 个 bool 值或一个整数，哪一个会更快？哪种方法将使磁盘上的表大小保持最低？这是在假设我需要的可以通过查询中的
c++ - 接口(interface)开销
我有一个看起来像 Boost.Array 的简单类。有两个模板参数 T 和 N。Boost.Array 的一个缺点是，每个使用这种数组的方法都必须是带有参数 N 的模板(T 可以)。结果是整个程序往往

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城