gpt4 book ai didi

python - 将查找表应用于 DataFrame 以获取 bin 或范围

转载 作者:太空宇宙 更新时间:2023-11-03 12:37:23 25 4
gpt4 key购买 nike

我有一个如下所示的 DataFrame。假设这些是销售人员列表的销售额。

enter image description here

此外,我有一个查找表,其中包含按美元金额计算的佣金。这看起来像下面这样。所以,$0-$50,000 = 5%,$50,001-$250,000 = 4%,等等。

enter image description here

我想做的是将查找表应用于销售表以生成以下 DataFrame。

enter image description here

尝试 1:

In [66]: a
Out[66]:
Sales_1 Sales_2 Sales_3
0 200000 300000 100000
1 100000 500000 500000
2 400000 1000000 200000

In [67]: b
Out[67]:
Commission
Sales
50000 0.05
250000 0.04
750000 0.03
9999999999 0.02

In [68]: c = b['Commission'][a <= b.index.values]
Traceback (most recent call last):

File "<ipython-input-68-d229bce29f01>", line 1, in <module>
c = b['Commission'][a <= b.index.values]

File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\ops.py", line 1184, in f
res = self._combine_const(other, func, raise_on_error=False)

File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 3555, in _combine_const
raise_on_error=raise_on_error)

File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2911, in eval
return self.apply('eval', **kwargs)

File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 2890, in apply
applied = getattr(b, f)(**kwargs)

File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1132, in eval
result = get_result(other)

File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\internals.py", line 1103, in get_result
result = func(values, other)

ValueError: operands could not be broadcast together with shapes (3,3) (4,)

尝试 2:

In [59]: a
Out[59]:
Sales_1 Sales_2 Sales_3
0 200000 300000 100000
1 100000 500000 500000
2 400000 1000000 200000

In [60]: b
Out[60]:
Commission
Sales
50000 0.05
250000 0.04
750000 0.03
9999999999 0.02

In [61]: c = b.lookup(a['Sales_1'],['Commission'])
Traceback (most recent call last):

File "<ipython-input-61-99e8134e826c>", line 1, in <module>
c = b.lookup(a['Sales_1'],['Commission'])

File "C:\WinPython64bit\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 2649, in lookup
raise ValueError('Row labels must have same size as column labels')

ValueError: Row labels must have same size as column labels

任何人都可以帮助我将查找表应用于 DataFrame 吗?它不必完全像这样,但这说明了我的一般需求。

最佳答案

要处理范围,pd.cut 是您的好 helper 。根据您当前的 b 数据框,您只需修改作为参数传递的 bins 列表以定义最低范围。我在这里输入 0,因为负销售额不存在,但如果需要,您也可以输入任何负数,甚至处理 -np.infnp.inf 而不是 1E14 为您的上下边界:

pd.cut(a.stack(), [0] + b.Sales.tolist(), labels=b.Commission).unstack()
Out[39]:
Sales_1 Sales_2 Sales_3
0 0.04 0.03 0.04
1 0.04 0.03 0.03
2 0.03 0.02 0.04

我发现像下面这样的 b 可以更清楚地与 cut 一起使用:

          Sales  Commission
0 -inf NaN
1 50000 0.05
2 250000 0.04
3 750000 0.03
4 inf 0.02

然后参数变成:

pd.cut(a.stack(), b.Sales, labels=b.Commission[1:]).unstack()

关于python - 将查找表应用于 DataFrame 以获取 bin 或范围,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43049245/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com