gpt4 book ai didi

python - 将值设置为 DataFrame 切片的副本

转载 作者:太空狗 更新时间:2023-10-30 00:24:51 30 4
gpt4 key购买 nike

我正在设置以下与我的情况和数据相似的示例:

比如说,我有以下 DataFrame:

df = pd.DataFrame ({'ID' : [1,2,3,4],
'price' : [25,30,34,40],
'Category' : ['small', 'medium','medium','small']})


  Category  ID  price
0 small 1 25
1 medium 2 30
2 medium 3 34
3 small 4 40

现在,我有以下函数,它根据以下逻辑返回折扣金额:

def mapper(price, category):
if category == 'small':
discount = 0.1 * price
else:
discount = 0.2 * price
return discount

现在我想要生成的 DataFrame:

  Category  ID  price Discount
0 small 1 25 0.25
1 medium 2 30 0.6
2 medium 3 40 0.8
3 small 4 40 0.4

所以我决定在列价格上调用 series.map,因为我不想使用应用。我正在处理一个大型 DataFrame,而 map 比 apply 快得多。

我试过这样做:

for c in list(sample.Category.unique()):
sample[sample['Category'] == c]['Discount'] = sample[sample['Category'] == c]['price'].map(lambda x: mapper(x,c))

这并没有像我预期的那样工作,因为我试图在 DataFrame 切片的副本上设置一个值。

我的问题是,有没有办法不使用 df.apply() 来做到这一点?

最佳答案

一种方法 np.where -

mask = df.Category.values=='small'
df['Discount'] = np.where(mask,df.price*0.01, df.price*0.02)

另一种稍微不同的方式 -

df['Discount'] = df.price*0.01
df['Discount'][df.Category.values!='small'] *= 2

为了提高性能,您可能希望使用数组数据,因此我们可以在使用 df.price 的地方使用 df.price.values

基准测试

方法-

def app1(df): # Proposed app#1 here
mask = df.Category.values=='small'
df_price = df.price.values
df['Discount'] = np.where(mask,df_price*0.01, df_price*0.02)
return df

def app2(df): # Proposed app#2 here
df['Discount'] = df.price.values*0.01
df['Discount'][df.Category.values!='small'] *= 2
return df

def app3(df): # @piRSquared's soln
df.assign(
Discount=((1 - (df.Category.values == 'small')) + 1) / 100 * df.price.values)
return df

def app4(df): # @MaxU's soln
df.assign(Discount=df.price * df.Category.map({'small':0.01}).fillna(0.02))
return df

时间 -

1) 大数据集:

In [122]: df
Out[122]:
Category ID price Discount
0 small 1 25 0.25
1 medium 2 30 0.60
2 medium 3 34 0.68
3 small 4 40 0.40

In [123]: df1 = pd.concat([df]*1000,axis=0)
...: df2 = pd.concat([df]*1000,axis=0)
...: df3 = pd.concat([df]*1000,axis=0)
...: df4 = pd.concat([df]*1000,axis=0)
...:

In [124]: %timeit app1(df1)
...: %timeit app2(df2)
...: %timeit app3(df3)
...: %timeit app4(df4)
...:
1000 loops, best of 3: 209 µs per loop
10 loops, best of 3: 63.2 ms per loop
1000 loops, best of 3: 351 µs per loop
1000 loops, best of 3: 720 µs per loop

2) 非常大的数据集:

In [125]: df1 = pd.concat([df]*10000,axis=0)
...: df2 = pd.concat([df]*10000,axis=0)
...: df3 = pd.concat([df]*10000,axis=0)
...: df4 = pd.concat([df]*10000,axis=0)
...:

In [126]: %timeit app1(df1)
...: %timeit app2(df2)
...: %timeit app3(df3)
...: %timeit app4(df4)
...:
1000 loops, best of 3: 758 µs per loop
1 loops, best of 3: 2.78 s per loop
1000 loops, best of 3: 1.37 ms per loop
100 loops, best of 3: 2.57 ms per loop

进一步插入数据重用 -

def app1_modified(df):
mask = df.Category.values=='small'
df_price = df.price.values*0.01
df['Discount'] = np.where(mask,df_price, df_price*2)
return df

时间 -

In [133]: df1 = pd.concat([df]*10000,axis=0)
...: df2 = pd.concat([df]*10000,axis=0)
...: df3 = pd.concat([df]*10000,axis=0)
...: df4 = pd.concat([df]*10000,axis=0)
...:

In [134]: %timeit app1(df1)
1000 loops, best of 3: 699 µs per loop

In [135]: %timeit app1_modified(df1)
1000 loops, best of 3: 655 µs per loop

关于python - 将值设置为 DataFrame 切片的副本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42914747/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com