gpt4 book ai didi

python - Pandas / NumPy : Fastest way to create a ladder?

转载 作者:太空宇宙 更新时间:2023-11-04 00:37:22 25 4
gpt4 key购买 nike

我有一个像这样的 Pandas 数据框:

    color     cost    temp
0 blue 12.0 80.4
1 red 8.1 81.2
2 pink 24.5 83.5

我想为每一行创建一个成本“阶梯”或“范围”,增量为 50 美分,从低于当前成本 0.50 美元到高于当前成本 0.50 美元。我当前的代码类似于以下内容:

incremented_prices = []

df['original_idx'] = df.index # To know it's original label

for row in df.iterrows():
current_price = row['cost']
more_costs = numpy.arange(current_price-1, current_price+1, step=0.5)

for cost in more_costs:
row_c = row.copy()
row_c['cost'] = cost
incremented_prices.append(row_c)

df_incremented = pandas.concat(incremented_prices)

这段代码将生成一个 DataFrame,如下所示:

    color     cost    temp  original_idx
0 blue 11.5 80.4 0
1 blue 12.0 80.4 0
2 blue 12.5 80.4 0
3 red 7.6 81.2 1
4 red 8.1 81.2 1
5 red 8.6 81.2 1
6 pink 24.0 83.5 2
7 pink 24.5 83.5 2
8 pink 25.0 83.5 2

在真正的问题中,我将范围从 -$50.00 到 $50.00,我发现这真的很慢,有没有更快的向量化方法?

最佳答案

您可以尝试使用 numpy.repeat 重新创建数据框:

cost_steps = pd.np.arange(-0.5, 0.51, 0.5)
repeats = cost_steps.size

pd.DataFrame(dict(
color = pd.np.repeat(df.color.values, repeats),
# here is a vectorized method to calculate the costs with all steps added with broadcasting
cost = (df.cost.values[:, None] + cost_steps).ravel(),
temp = pd.np.repeat(df.temp.values, repeats),
original_idx = pd.np.repeat(df.index.values, repeats)
))

enter image description here

更新更多列:

df1 = df.rename_axis("original_idx").reset_index()
cost_steps = pd.np.arange(-0.5, 0.51, 0.5)
repeats = cost_steps.size

pd.DataFrame(pd.np.hstack((pd.np.repeat(df1.drop("cost", 1).values, repeats, axis=0),
(df1.cost[:, None] + cost_steps).reshape(-1, 1))),
columns=df1.columns.drop("cost").tolist()+["cost"])

enter image description here

关于python - Pandas / NumPy : Fastest way to create a ladder?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43415463/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com