python pandas groupby/应用 : what exactly is passed to the apply function?-6ren

python pandas groupby/应用 : what exactly is passed to the apply function?

转载作者：太空宇宙更新时间：2023-11-03 23:55:46

25

4

这里是 Python 新手。我试图了解 pandas groupby 和 apply 方法的工作原理。我找到了 this简单的例子，我贴在下面:

import pandas as pd

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
   'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
   'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
   'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
   'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}

df = pd.DataFrame(ipl_data)

数据框 df 如下所示:

      Team  Rank  Year  Points
0   Riders     1  2014     876
1   Riders     2  2015     789
2   Devils     2  2014     863
3   Devils     3  2015     673
4    Kings     3  2014     741
5    kings     4  2015     812
6    Kings     1  2016     756
7    Kings     1  2017     788
8   Riders     2  2016     694
9   Royals     4  2014     701
10  Royals     1  2015     804
11  Riders     2  2017     690

到目前为止，还不错。然后我想转换我的数据，以便从每组团队中我只保留 Points 列中的第一个元素。首先检查了 df['Points'][0] 确实给了我 df 的第一个 Points 元素，我尝试了这个:

df.groupby('Team').apply(lambda x : x['Points'][0])

认为 lambda 函数的参数 x 是另一个 pandas 数据帧。但是，python 会产生错误:

File "pandas/_libs/index.pyx", line 81, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 89, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 987, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 993, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0

这似乎与哈希表有关，但我不明白为什么。然后我想也许传递给 lambda 的不是数据帧，所以我运行了这个:

df.groupby('Team').apply(lambda x : (type(x), x.shape))

输出:

Team
Devils    (<class 'pandas.core.frame.DataFrame'>, (2, 4))
Kings     (<class 'pandas.core.frame.DataFrame'>, (3, 4))
Riders    (<class 'pandas.core.frame.DataFrame'>, (4, 4))
Royals    (<class 'pandas.core.frame.DataFrame'>, (2, 4))
kings     (<class 'pandas.core.frame.DataFrame'>, (1, 4))
dtype: object

IIUC 表明 lambda 的参数确实是一个 pandas 数据框，其中包含每个团队的 df 子集。

我知道我可以通过运行得到想要的结果:

df.groupby('Team').apply(lambda x : x['Points'].iloc[0])

我只是想了解为什么 df['Points'][0] 有效而 x['Points'][0] 不在应用程序中功能。感谢阅读!

最佳答案

当您调用 df.groupby('Team').apply(lambda x: ...) 时，您实际上是按 Team 分割数据帧并将每个 block 传递给 lambda 函数:

      Team  Rank  Year  Points
0   Riders     1  2014     876
1   Riders     2  2015     789
8   Riders     2  2016     694
11  Riders     2  2017     690
------------------------------
2   Devils     2  2014     863
3   Devils     3  2015     673
------------------------------
4    Kings     3  2014     741
6    Kings     1  2016     756
7    Kings     1  2017     788
------------------------------
5    kings     4  2015     812
------------------------------
9   Royals     4  2014     701
10  Royals     1  2015     804

df['Points'][0] 之所以有效，是因为您告诉 pandas“获取 Points 系列标签 0 处的值”，该值存在。

.apply(lambda x: x['Points'][0]) 不起作用，因为只有 1 个 block (Riders)有标签 0。因此你得到关键错误。

话虽如此，apply 是通用的，因此与内置的矢量化聚合函数相比它非常慢。您可以使用 first:

df.groupby('Team')['Points'].first()

关于python pandas groupby/应用 : what exactly is passed to the apply function?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57747894/

25

4

0

文章推荐： javascript 或 css - 几乎可以工作，遇到一些障碍

文章推荐： c - 包含多个数组的二叉树结构

文章推荐： node.js - 我应该在每个模块中都要求吗？

文章推荐： css - 基于用户代理加载不同的 .css - GWT 元素

exact-online - “使用选择”符号指定 Exact Online 数据容器的多个分区
我想在数据中心选择一个事件分区。通常我会使用以下语句: INVANTIVE> use 1552839 2> Exclamation itgendhb077: Error in Invantive Da
git - 使用 EXACT 文件夹结构创建包含子模块的 EXACT git 存储库并将其推送到 github
我认为我的可能是 git 子模块的最简单用例。我有一个目录结构 --- --- --- --- 每个子目录都是一个 git 存储库。我只想跟踪在我的中添加的不同 git
r - .subset2(x, i, exact = exact) 错误 : subscript out of bounds in R
我正在尝试循环数据框中的特定数字列，目标是使用“cor.test”函数提取相关性和 p 值。相关性在于计算线性关系一个分类变量，由针对每个特定数字列的 0 和 1 值组成。到目前为止，这是我的代码
exact-online - Exact Online 上的 Invantive Data Hub 查询返回太多行
当我使用 Invantive Data Hub 从多个 Exact Online 公司下载数据时，我得到了重复的行，而我希望每个公司只有一行。我使用以下查询: select gla.code ,
exact-online - 比利时 Exact Online 数据访问点上的 itgenobr001 : Client not found.
我们刚刚上线 https://ecotaksen.be 。 Exact 上的查询和更新运行良好，但安装生产许可证后出现错误 itgenobr001:找不到客户端。。我的数据容器规范是: 使用具有相
exact-online - 如何使用 Invantive Query Tool 从 Exact Online 仅下载我的采购发票文件？
为了遵守法规，我尝试从我的一些部门下载采购发票文件(PDF 文件)，将它们保存在磁盘上以供存档。我使用 Invantive 查询工具来执行此操作。我想知道使用哪个表以及如何仅针对采购发票文档导出这些
python - BeautifulSoup 问题 : How to get the exact link by matching the exact tag content?
我想获取“S-1”之后的链接，而不是“S-1/A”之后的链接。我尝试了“.find_all(lambda tag: tag.name == 'td' and tag.get()==['S-1'])”，
python - 如何修复谷歌地球引擎中的 "Manifests for TfRecord ingestion must have exactly one tileset with exactly one source"
当我尝试通过 Google Colaboratory 中的 Earthengine 命令行上传 .tfrecord 和 .json 文件时，它显示“TfRecord 摄取 list 必须具有一个具有一
security - 非法质数 : What is it exactly?
Closed. This question is off-topic 。它目前不接受答案。想改善这个问题吗？ Update the question 所以它是堆栈溢出的 on-topic。 10年前
c++ - 模板的模板成员的消歧模板关键字 : when exactly?
这里给出了一个关于模板消歧器的问题: template disambiguator 在答案中我们可以读到: ISO C++03 14.2/4 When the name of a member tem
r - 病例对照研究 "exact"与重叠时间间隔匹配
我想在考虑时间间隔的同时进行病例对照匹配。如果对照观察的自变量 X1、X2 和重叠时间间隔 X3 与一个案例具有相同的值，我想要一个匹配项。例如，假设以下 df1: row Y X1 X2
css - 什么动画:none do exactly?
我在这里有一个具有这种起始样式的 HTML 元素: transition: transform 2s; 首先是动画 (它旋转X)通过点击添加的类。下次单击时，将添加另一个类，该类添加了 transfo
iphone - EAGL : What does it stand for exactly?
我忘了，但是 EAGL 代表什么具体的东西吗？或者它只是核心动画 OpenGL 命名约定的一部分(CAEAGLLayer 等)？最佳答案 “AGL”是苹果 OS X 的 OpenGL 扩展的名称。我
Angular 树摇晃 : How exactly does it work?
我们目前正在尝试优化复杂的 Angular 应用程序(性能和包大小)。我们发现我们有部分未使用的组件，但我们不能 100% 确定它们。无论如何......我们目前要问的问题是，摇树在 Angular
R解决:system is exactly singular
我正在解决简单的优化问题。该数据集有 26 列和 3000 多行。源代码看起来像 Means <- colMeans(Returns) Sigma <- cov(Returns) invSi
安卓， Kotlin : What exactly is called here?
我让 Android Studio 将我的代码转换为 OnClickListener . 显然这里使用了 lambda。我不知道 lambda 是传递给 View 类的函数还是传递给 OnClickL
c - "What this value exactly equal to?"
关闭。此题需要details or clarity 。目前不接受答案。想要改进这个问题吗？通过 editing this post 添加详细信息并澄清问题. 已关闭 3 年前。 Improve th
java - 转换到按钮控件(Android): what exactly is that?
关于“转换”的可用(类似)问题并没有真正阐明这是什么或做什么(顺便说一下，刚开始进行 Android 编程)。人们在哪里以及如何注意到“类型转换”的效果？有什么区别: Button b = (But
php - 定点类型 - "exact value or not"？
我需要创建一个列，其中可以存储“0.0 - 99.99”之间的值。为什么？由于这种情况: 我的数据库中有这个表: "CREATE TABLE dumps( id INT
MySQL - "exact match"针对某个值
我正在摸不着头脑，经过一天的互联网搜索，我决定问你这个问题。我有一个包含 2 个字段 tag_id 和 tag 的表 TAG，我试图将 TAG 的记录与特定字符串完全匹配，但我无法完全匹配，只能部分

首页

博学

6Ren·AI

商城

python pandas groupby/应用 : what exactly is passed to the apply function?