gpt4 book ai didi

R 中 tidyr::complete 的 Python 等价物,允许指定附加值

转载 作者:太空宇宙 更新时间:2023-11-03 11:13:37 25 4
gpt4 key购买 nike

我正在寻找重新创建 R 脚本的方法,但我对如何在 Python 中重新创建此管道感到困惑。我正在分析不同工厂的累计生产,需要将它们的累计生产时间归一化以便进行比较。

管道看起来像这样:

Norm_hrs <- Cum_df%>%
group_by(Name)%>%
complete(Cum_hrs = seq(0,max(Cum_hrs),730.5))

它需要这样:

Name        Cum_Hrs A   B           C
Factory 1 1 0 1.887861 3.775722
Factory 1 251 0 2104.335728 21932.57871
Factory 1 611 0 2324.586178 37498.99722
Factory 1 1208 0 4361.588197 65235.05541
Factory 2 48 0 1517.840244 6604.770432
Factory 2 163 0 3370.461172 17252.70972
Factory 2 822 0 13284.87786 71918.78308
Factory 2 1541 0 21476.93602 134569.0388
Factory 2 2285 0 32053.99192 225895.1477
Factory 2 3028 0 42299.41357 340798.6151
Factory 2 3699 0 50125.85599 462145.5438
Factory 2 4436 0 56715.74945 584474.9989

然后把它变成这样:

Name        Cum_Hrs A   B           C
Factory 1 1 0 1.887861 3.775722
Factory 1 251 0 2104.335728 21932.57871
Factory 1 611 0 2324.586178 37498.99722
Factory 1 730.5 NA NA NA
Factory 1 1208 0 4361.588197 65235.05541
Factory 2 48 0 1517.840244 6604.770432
Factory 2 163 0 3370.461172 17252.70972
Factory 2 730.5 NA NA NA
Factory 2 822 0 13284.87786 71918.78308
Factory 2 1461 NA NA NA
Factory 2 1541 0 21476.93602 134569.0388
Factory 2 2091.5 NA NA NA
Factory 2 2285 0 32053.99192 225895.1477
Factory 2 2922 NA NA NA
Factory 2 3028 0 42299.41357 340798.6151

这反过来又允许我在 DataFrame 中插入 NA 的值以获得标准化的时间步长

最佳答案

简单地将所有唯一 Name 的顺序数据帧与增量 Cum_Hrs 值连接起来:

seq_df = pd.concat([pd.DataFrame({'Name': i, 'Cum_Hrs': np.arange(0, max(g['Cum_Hrs']), 730.5)})
for i,g in df.groupby(['Name'])])

final_df = (pd.concat([df, seq_df], sort=True)
.sort_values(['Name', 'Cum_Hrs'])
.reset_index(drop=True)
.reindex(columns=df.columns)
)

print(final_df)
# Name Cum_Hrs A B C
# 0 Factory 1 0.0 NaN NaN NaN
# 1 Factory 1 1.0 0.0 1.887861 3.775722
# 2 Factory 1 251.0 0.0 2104.335728 21932.578710
# 3 Factory 1 611.0 0.0 2324.586178 37498.997220
# 4 Factory 1 730.5 NaN NaN NaN
# 5 Factory 1 1208.0 0.0 4361.588197 65235.055410
# 6 Factory 2 0.0 NaN NaN NaN
# 7 Factory 2 48.0 0.0 1517.840244 6604.770432
# 8 Factory 2 163.0 0.0 3370.461172 17252.709720
# 9 Factory 2 730.5 NaN NaN NaN
# 10 Factory 2 822.0 0.0 13284.877860 71918.783080
# 11 Factory 2 1461.0 NaN NaN NaN
# 12 Factory 2 1541.0 0.0 21476.936020 134569.038800
# 13 Factory 2 2191.5 NaN NaN NaN
# 14 Factory 2 2285.0 0.0 32053.991920 225895.147700
# 15 Factory 2 2922.0 NaN NaN NaN
# 16 Factory 2 3028.0 0.0 42299.413570 340798.615100
# 17 Factory 2 3652.5 NaN NaN NaN
# 18 Factory 2 3699.0 0.0 50125.855990 462145.543800
# 19 Factory 2 4383.0 NaN NaN NaN
# 20 Factory 2 4436.0 0.0 56715.749450 584474.998900

类似的过程可以在 base R 中处理。通常将 base R(非 tidyverse)翻译成 Pandas 更容易:

  • seq ==> np.arange
  • by ==> pd.DataFrame.groupby
  • data.frame ==> pd.DataFrame
  • do.call + rbind ==> pd.concat
  • 顺序 ==> pd.sort_values
  • row.names=NULL ==> pd.reset_index()

R

# BUILD SEQUENCE DATA FRAME
seq_df = do.call(rbind, by(df, df$Name, function(sub)
data.frame(Name = sub$Name[[1]],
Cum_Hrs = seq(0, max(sub$Cum_Hrs), 730.5),
A = NA, B = NA, C = NA))
)

# CONCATENATE REFERENCING EVERY COLUMN
final_df = rbind(df, seq_df)

# SORT ROWS AND RESET ROW NAMES
final_df = with(final_df, data.frame(final_df[order(Name, Cum_Hrs),], row.names=NULL))

final_df

Rextester Demo

关于R 中 tidyr::complete 的 Python 等价物,允许指定附加值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55855041/

25 4 0