python - 为什么 DataFrame.loc[[1]] 比 df.ix [[1]] 慢 1,800 倍，比 df.loc[1] 慢 3,500 倍？-6ren

python - 为什么 DataFrame.loc[[1]] 比 df.ix [[1]] 慢 1,800 倍，比 df.loc[1] 慢 3,500 倍？

转载作者：太空狗更新时间：2023-10-29 18:26:28

31

4

自己试试看:

import pandas as pd
s=pd.Series(xrange(5000000))
%timeit s.loc[[0]] # You need pandas 0.15.1 or newer for it to be that slow
1 loops, best of 3: 445 ms per loop

更新:大概是2014年8月左右在0.15.1中引入的a legitimate bug in pandas。解决方法:使用旧版本的 pandas 等待新版本发布；得到一个尖端的开发者。来自github的版本；在您发布的 pandas 中手动进行一行修改；暂时使用 .ix 而不是 .loc 。

我有一个包含 480 万行的 DataFrame，使用 .iloc[[ id ]](带有单元素列表)选择单行需要 489 毫秒，将近半秒，比相同的方法慢 1,800 倍.ix[[ id ]] ，并且比 .iloc[id] 慢 3,500 倍(将 id 作为值而不是列表传递)。公平地说，无论列表的长度如何，.loc[list] 花费的时间都差不多，但我不想在上面花费 489 毫秒，尤其是当 .ix 快一千倍，并且产生相同的结果时结果。我的理解是 .ix 应该更慢，不是吗？

我正在使用 Pandas 0.15.1。关于 Indexing and Selecting Data 的优秀教程表明 .ix 在某种程度上比 .loc 和 .iloc 更通用，而且可能更慢。具体来说，它说

However, when an axis is integer based, ONLY label based access and not positional access is supported. Thus, in such cases, it’s usually better to be explicit and use .iloc or .loc.

这是一个带有基准测试的 iPython session :

print 'The dataframe has %d entries, indexed by integers that are less than %d' % (len(df), max(df.index)+1) print 'df.index begins with ', df.index[:20] print 'The index is sorted:', df.index.tolist()==sorted(df.index.tolist()) # First extract one element directly. Expected result, no issues here. id=5965356 print 'Extract one element with id %d' % id %timeit df.loc[id] %timeit df.ix[id] print hash(str(df.loc[id])) == hash(str(df.ix[id])) # check we get the same result # Now extract this one element as a list. %timeit df.loc[[id]] # SO SLOW. 489 ms vs 270 microseconds for .ix, or 139 microseconds for .loc[id] %timeit df.ix[[id]] print hash(str(df.loc[[id]])) == hash(str(df.ix[[id]])) # this one should be True # Let's double-check that in this case .ix is the same as .loc, not .iloc, # as this would explain the difference. try: print hash(str(df.iloc[[id]])) == hash(str(df.ix[[id]])) except: print 'Indeed, %d is not even a valid iloc[] value, as there are only %d rows' % (id, len(df)) # Finally, for the sake of completeness, let's take a look at iloc %timeit df.iloc[3456789] # this is still 100+ times faster than the next version %timeit df.iloc[[3456789]]
输出:

The dataframe has 4826616 entries, indexed by integers that are less than 6177817 df.index begins with Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], dtype='int64') The index is sorted: True Extract one element with id 5965356 10000 loops, best of 3: 139 µs per loop 10000 loops, best of 3: 141 µs per loop True 1 loops, best of 3: 489 ms per loop 1000 loops, best of 3: 270 µs per loop True Indeed, 5965356 is not even a valid iloc[] value, as there are only 4826616 rows 10000 loops, best of 3: 98.9 µs per loop 100 loops, best of 3: 12 ms per loop

最佳答案

Pandas 索引非常慢，我切换到 numpy 索引

df=pd.DataFrame(some_content) # takes forever!! for iPer in np.arange(-df.shape[0],0,1): x = df.iloc[iPer,:].values y = df.iloc[-1,:].values # fast! vals = np.matrix(df.values) for iPer in np.arange(-vals.shape[0],0,1): x = vals[iPer,:] y = vals[-1,:]

关于python - 为什么 DataFrame.loc[[1]] 比 df.ix [[1]] 慢 1,800 倍，比 df.loc[1] 慢 3,500 倍？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27596832/

31

4

0

文章推荐： python - 使用 Python 图像库 (PIL) 通过输入像素值创建图像

文章推荐： python - 如何更正此 sqlalchemy.exc.NoForeignKeysError？

iOS 推送通知国际化 : payload's "loc-key" and "loc-args"
我在 Apple 的相关文档中没有找到这个:是否必须包含字段“loc-args”，即使您不需要任何参数并且它是空的，当提供字段“loc-key”时“？谢谢最佳答案 loc-key A key to
windows - LOC 和 %LOC 在 Windows 的子程序中不起作用
我有一个 Fortran 90 项目，它广泛使用 loc 函数来获取数组的地址(与 Matlab 互操作的 API 的一部分)。这段代码在 Mac 和 Linux 上编译并运行在 Intel 和 g
python - 是否可以在 Pandas 的 loc 中使用 loc 来替换值？
让我先概述一下我要解决的问题。我试图根据包含“-1”的行中的其他两个值，将值“-1”替换为同一列中的另一个值。为了更清楚，这是一个例子。在下面的数据框中，“所有者”列中有两个缺失值。我想要的是将每个
python - 用于 bool 索引的 Pandas、loc 与非 loc
我所做的所有研究都指向使用 loc作为通过 col(s) 值过滤数据帧的方法，今天我正在阅读 this我通过我测试的例子发现，loc当按值过滤 cols 时，不是真的需要: 前任: df = pd.D
python - .loc[索引, 列] 和 .loc[索引][列] 之间有什么区别？
这个问题已经有答案了: How to deal with SettingWithCopyWarning in Pandas (21 个回答) 已关闭 4 年前。假设我有一个像这样的数据框，第一列“密
swift - 如何在 iOS 推送通知中合并 loc-args 和 loc-key 字符串
我想在我的应用程序打开时将来自推送通知负载的 loc-args 数组的第二个元素设置为 loc-key 转换，例如在 didReceiveRemoteNotification 方法中。有效负载中的
pandas - df.loc[rows, [col]] 与 df.loc[rows, col] 在分配中
以下赋值有何不同？ df.loc[rows, [col]] = ... df.loc[rows, col] = ... 例如: r = pd.DataFrame({"response": [1,1,1
c++ - 需要代码度量。最佳代码中 h 文件中的 LOC 与 cpp 文件中的 LOC 的比率
在给定 h 文件中的 LOC 数量的情况下，我可以估计最佳代码(桌面应用程序)中的 C++ LOC 数量是多少？背景:我正在进行工作量估算和将 C++ 软件移植到 C# 的计划。我的第一个想法是创
javascript - 将共享的 'code_block' 从 loc-A 移动到 loc-B，只有一个 'code_block' 的写入实例
目标:通过实现可重用的 JS(或 ASP？)消除初始 DOM 中的冗余。在这个例子中，我想写一些 JS 来将 div @id loc-A 的内容“bump”到 div @id loc-B，而不必在页
python - 为什么 Pandas 中的 Pandas .loc 速度取决于 DataFrame 初始化？如何使 MultiIndex .loc 尽可能快？
我正在尝试提高代码性能。我使用 Pandas 0.19.2 和 Python 3.5。我刚刚意识到 .loc 一次写入一大堆值的速度非常不同，具体取决于数据帧初始化。谁能解释为什么，并告诉我什么是
python - 为什么 DataFrame.loc[[1]] 比 df.ix [[1]] 慢 1,800 倍，比 df.loc[1] 慢 3,500 倍？
自己试试看: import pandas as pd s=pd.Series(xrange(5000000)) %timeit s.loc[[0]] # You need pandas 0.15.1
mercurial - 查找在特定提交中添加的存储库 LOC
是否可以找到在特定提交中添加的存储库的总代码行数？最佳答案流失扩展做我需要的: hg churn --rev 100 关于mercurial - 查找在特定提交中添加的存储库 LOC，我们在Sta
metrics - LOC 计数应该包括测试和评论吗？
虽然 LOC(# 代码行数)是衡量代码复杂性的一个有问题的方法，但它是最流行的方法，如果使用得非常小心，至少可以粗略估计代码库的相对复杂性(即，如果一个程序是 10KLOC)另一个是 100KLOC，
SonarQube LOC 分析限制
我即将在大型项目上使用SonarQube，并一直在搜索有关LOC限制的信息进行分析，但他们的网站上没有相关信息。有没有？如果是的话，限制是多少？最佳答案无论是在单个项目内还是跨实例，都没有硬性限制
SonarQube - 如何计算 LOC
我正在使用 SonarQube Developer Edition 5.6.7 (LTS) 并购买了支持 500 万 LOC 的许可证。我们通过拥有项目 key 和模板来使用 RBAC 和 Sonar
java - 翻译 LOC
是否有人遇到过这样的情况:用 Java 编写并由(例如)法国程序员编写的现有代码库必须转换为英语程序员可以理解的代码？这里的问题是变量/方法/类名称、注释等都将采用该特定语言。现在有可用的自动化解决
python - 如何在多级数据帧上正确使用 .loc？
给定 df 'AB': A = pd.DataFrame([[1, 5, 2], [2, 4, 4], [3, 3, 1], [4, 2, 2], [5, 1, 4]], colum
python - 如何在fstrings中使用.loc？
我有一个像这样的数据框: import pandas as pd df = pd.DataFrame({'col1': ['abc', 'def', 'tre'],
python - .loc 索引改变类型
如果我有一个 pandas.DataFrame具有不同类型的列(例如 int64 和 float64 )，从 int 获取单个元素列 .loc索引将输出转换为 float : import panda
python - 加速数据框 .loc()
我有一个大约 400k IP 的列表(存储在 pandas DataFrame df_IP 中)使用 maxming geoIP 数据库进行地理定位。我使用城市版本，并检索城市、纬度、经度和县代码(法

首页

博学

6Ren·AI

商城

python - 为什么 DataFrame.loc[[1]] 比 df.ix [[1]] 慢 1,800 倍，比 df.loc[1] 慢 3,500 倍？