python - 为 Pandas 数据框中的每一行循环 IF 语句-6ren

python - 为 Pandas 数据框中的每一行循环 IF 语句

转载作者：太空宇宙更新时间：2023-11-04 03:01:12

您好，我刚开始使用来自 SAS 背景的 pandas，我正在尝试使用以下代码将连续变量分割成波段。

var_range = df['BILL_AMT1'].max() - df['BILL_AMT1'].min()
a= 10
for i in range(1,a):
    inc = var_range/a
    lower_bound = df['BILL_AMT1'].min() + (i-1)*inc
    print('Lower bound is '+str(lower_bound))
    upper_bound = df['BILL_AMT1'].max() + (i)*inc
    print('Upper bound is '+str(upper_bound))
    if (lower_bound <= df['BILL_AMT1'] < upper_bound):
        df['bill_class'] = i
    i+=1

我期望代码检查 df['BILL_AMT1'] 的值是否在当前循环边界内并相应地设置一个 df['bill_class'] .

我收到以下错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我认为 if 条件评估正确，但错误是由于为新列分配了 for 循环计数器的值。

任何人都可以解释发生了什么问题或提出替代方案。

最佳答案

为了避免 ValueError , 改变

if (lower_bound <= df['BILL_AMT1'] < upper_bound):
    df['bill_class'] = i

到

mask = (lower_bound <= df['BILL_AMT1']) & (df['BILL_AMT1'] < upper_bound)
df.loc[mask, 'bill_class'] = i

chained comparison (lower_bound <= df['BILL_AMT1'] < upper_bound)相当于

(lower_bound <= df['BILL_AMT1']) and (df['BILL_AMT1'] < upper_bound)

and运算符导致两个 bool 系列 (lower_bound <= df['BILL_AMT1']) , (df['BILL_AMT1'] < upper_bound)在 bool 上下文中进行评估——即减少为单个 bool 值。 Pandas refuses to reduce系列到单个 bool 值。

相反，要返回 bool 系列，请使用 &运算符而不是 and :

mask = (lower_bound <= df['BILL_AMT1']) & (df['BILL_AMT1'] < upper_bound)

然后给 bill_class 赋值列在哪里 mask为真，使用 df.loc :

df.loc[mask, 'bill_class'] = i

将df['BILL_AMT1']中的数据装箱，您可以删除 Python for-loop完全，并作为DSM suggests , 使用 pd.cut :

df['bill_class'] = pd.cut(df['BILL_AMT1'], bins=10, labels=False)+1

关于python - 为 Pandas 数据框中的每一行循环 IF 语句，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40854269/

文章推荐： python traceback - 如何引发异常并保留堆栈

文章推荐： c - 为什么我的代码不输出到文本文件

文章推荐： c# - 将带有 C 和 asm 代码的程序转换为 DLL

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 为 Pandas 数据框中的每一行循环 IF 语句