c++ - R 和 Simmer : Performance boost on large data frames-6ren

c++ - R 和 Simmer : Performance boost on large data frames

转载作者：太空宇宙更新时间：2023-11-04 12:42:50

27

4

我有自己的关于实际事件/任务的数据框，我使用 simmer r 包来模拟如果有不同的资源可用，可以完成多少任务。我的模拟在我的数据框中运行得非常快，最多可达 120.000 行。

rm(list=ls())
library(dplyr)
library(simmer)
library(simmer.plot)

load("task_df.RDATA")

working_hours <- 7.8
productivity <- 0.7
no.employees <- 292

SIM_TIME <- round((working_hours*productivity*60), 0)+1

employees <- vector("character")

for (i in 1:no.employees) {
  employees[i] <- paste("employee", i, sep="_")
}

taskTraj <- trajectory(name = "tasK simulation") %>%
  simmer::select(resources = employees, policy = "shortest-queue") %>%
  seize_selected(amount = 1) %>%
  timeout_from_attribute("duration") %>%
  release_selected(amount = 1)


arrivals_gen <- simmer() 

for (i in 1:no.employees) {arrivals_gen %>%
    add_resource(paste("employee", i, sep="_"), capacity = 1) 
} 

ptm <- proc.time()

arrivals_gen <- arrivals_gen %>%
  add_dataframe("Task_", taskTraj, task_df, mon = 2, col_time = "time", time = "absolute",  col_priority="priority")  %>%
  run(SIM_TIME)

proc.time() - ptm

但我的数据框 tasK_df 包含 350k 数据集，这就是我的模拟花费更多时间的地方。

头(task_df，n = 50)

workload_shift  task_id duration priority time
1        20180403 68347632        3    2.502    0
2        20180403 68151881       10   24.478    0
3        20180403 68069718        3    0.724    0
4        20180403 68345621        4    2.226    0
5        20180403 68508858        3   36.062    0
6        20180403 66148996        3    9.421    0
7        20180403 68565066        2   24.478    0
8        20180403 68005344        3    7.910    0
9        20180403 55979902        3    3.732    0
10       20180403 66452138        2    2.502    0
11       20180403 68051869       10    2.226    0
12       20180403 68561364       10    3.584    0
13       20180403 59292591        3    2.138    0
14       20180403 68415657       10    2.853    0
15       20180403 66848400        3    2.290    0
16       20180403 68454851       10    6.167    0
17       20180403 68361846       10   11.688    0
18       20180403 68572723        2    6.259    0
19       20180403 68520328        2   24.478    0
20       20180403 68500955       10    1.855    0
21       20180403 67000753        3  219.751    0
22       20180403 68487613        3    8.131    0
23       20180403 68333674        4    5.263    0
24       20180403 66423486        3    2.290    0
25       20180403 68241616        5    1.470    0
26       20180403 68415001        4    3.584    0
27       20180403 67487967        3    2.636    0
28       20180403 68494771       10    6.259    0
29       20180403 67673981       10    2.226    0
30       20180403 68355727        3    2.613    0
31       20180403 36942995        3    0.590    0
32       20180403 66633446        3    5.968    0
33       20180403 68461510        2   24.478    0
34       20180403 67126138        3    0.357    0
35       20180403 68485682        3    8.131    0
36       20180403 67852953       10    2.290    0
37       20180403 68150106       10    6.259    0
38       20180403 67833053       10    4.114    0
39       20180403 67816673        3    6.259    0
40       20180403 68041431        5    2.502    0
41       20180403 66283761        5    2.502    0
42       20180403 68543314        2   26.302    0
43       20180403 68492843        3    2.290    0
44       20180403 68556960        4    2.853    0
45       20180403 66885335        3    5.975    0
46       20180403 66249231        5    2.636    0
47       20180403 68242565       12    1.470    0
48       20180403 68530355        2    2.290    0
49       20180403 66683717        5    5.705    0
50       20180403 67802538        4    0.864    0

用户系统已过期
76.745 0.039 76.717

对比

用户系统已过期608.443 0.270 608.186

My CPU

有没有办法提高我的模拟能力？我使用 simmer 4.1.0 和 Rcpp 1.0.0。内存似乎不是问题。

最佳答案

我拿了你的表并简单地复制它来构建 100k 和 400k 数据集，我确认了这个问题:执行时间不是线性的。

在内部，属性总是double，所以有很多转换，逐行，这显然占用了大部分执行时间(!)。在将表格输入 simmer 之前尝试转换表格。使用 dplyr,

task_df <- mutate_all(task_df, as.double)

模拟应该更快，并且增加行数的执行时间应该或多或少线性增长。很明显为什么这么多转换会降低性能，但我不确定为什么它会使执行时间非线性。

无论如何，在未来的版本中，我们可能希望自动应用它，这样用户就不必担心这些性能问题。

关于c++ - R 和 Simmer : Performance boost on large data frames，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53283052/

27

4

0

文章推荐： java - 数组中 TextEdit 中的参数不正确

文章推荐： css - 根据变量值将 LESS 编译成多个 CSS 文件

文章推荐： linux - 重命名 Laravel 5.1 迁移中的列 [SQL SERVER][Linux]

performance - "performant"软件究竟是什么意思？
关闭。这个问题是opinion-based .它目前不接受答案。想改善这个问题吗？更新问题，以便可以通过 editing this post 用事实和引文回答问题. 8年前关闭。 Improve t
performance - 灿灿授权: Performance Issue
暂时忘记能力的定义，只关注能力的“检查”(使用“授权!”)，我看到 CanCan 添加了大约 400 毫秒，用于简单地检查用户是否具有特定的能力主题/模型。这是预期的吗(我假设不是)？或者，有没有可
performance - Swift 显式与推断类型 : Performance
我正在阅读有关 Swift 的教程 ( http://www.raywenderlich.com/74438/swift-tutorial-a-quick-start )，它预定义为不显式设置类型，因
performance - 编码优先级 : Performance, 可维护性、可重用性？
这主要是由于对 SQL 问题的回答。由于性能原因，有意省略了 UDF 和子查询。我没有包括可靠性并不是说它应该被视为理所当然，但代码必须工作。性能永远是第一位的吗？提供了许多以性能为主要优先事项的答
performance - Scala递归与循环: performance and runtime considerations
我已经编写了一个简单的测试平台来测量三种阶乘实现的性能:基于循环的，非尾递归的和尾递归的。 Surprisingly to me the worst performant was the loop o
performance - ui-performance 插件无法在开发模式下工作 (Grails)
我已将 ui-performance 插件应用到我的应用程序中。不幸的是，在开发模式下运行应用程序时它似乎不起作用。例如，我的 javascript 导入是用“vnull”版本呈现的。例如不会
performance - 编译 F# 引用 : performance?
我有一个我操作的 F# 引用(我在各处添加对象池以回收经常创建和删除的短期对象)。我想运行结果报价；现在我使用了 F# PowerPack，它提供了将引用转换为表达式树和委托(delegate)的方法
performance - Spark独立: SparklyR : Performance issues
我正在尝试在 Spark 服务器上运行 SparklyR 库中的机器学习算法。 1 个簇 8 核 24G内存 Ubuntu 16.04 星火2.2 独立配置 1名师傅/2名 worker 每个执行器的
performance - 架构和索引以及主键 : Differences in lookup performance?
我有一个数据库(准确地说是在 postgres 上运行)，具有以下结构: user1 (schema) | - cars (table) - airplanes (table, again) .
performance - iOS/核心动画 : Performance tuning
我的应用程序在我的 iPad 上运行。但它的表现非常糟糕——我的速度低于 15fps。谁能帮我优化一下？它基本上是一个轮子(派生自 UIView)，包含 12 个按钮(派生自 UIControl)。
performance - coursera progfun1 : scala union performance
在完成“Scala 中的函数式编程原则”@coursera 类(class)第 3 周的作业时，我发现当我实现视频类(class)中所示的函数联合时: override def union(tha
performance - Symfony2 依赖注入(inject) : performances impact
我正在重构我的一个 Controller 以使其成为一项服务，我想知道不将整个服务容器注入(inject)我的 Controller 是否会对性能产生影响。这样效率更高吗: innova.path.
performance - facelet tag performance
我有一个要显示的内容很大的文件。例如在显示用户配置文件时，中的每个 EL 表达式需要一个 userId 作为 bean 的参数，该参数取自 session 上下文。我在 xhtml 文件中将这个 u
performance - OpenGL/DirectX : How does Mipmapping improve performance?
我非常了解 mipmapping。我不明白(在硬件/驱动程序级别)是 mipmapping 如何提高应用程序的性能(至少这是经常声称的)。在执行片段着色器之前，驱动程序不知道要访问哪个 mipmap
performance - Scala 惰性值 : performance penalty? 线程安全？
这个问题在这里已经有了答案: 10年前关闭。 Possible Duplicate: What's the (hidden) cost of lazy val? (Scala) Scala 允许定义惰
java - build().perform() 和 Perform() 之间有什么区别
一些文章建议现在 build() 包含在 perform() 本身中，而其他人则建议当要链接多个操作时使用 build().perform()一起。最佳答案 build() 包含在 perform(
performance - postgres 函数 : when does IMMUTABLE hurt performance?
Postgres docs说 For best optimization results, you should label your functions with the strictest vol
performance - 零成本抽象 : performance of for-loop vs. 迭代器
阅读Zero-cost abstractions看着 Introduction to rust: a low-level language with high-level abstractions我尝
performance - MQ : CPU Performance 上的 SSL
我想在 MQ 服务器上部署 SSL，但我想知道我当前的 CPU 容量是否支持 SSL。 (我没有预算增加 CPU 内核和 MQ PVU 的数量) 我的规范: Windows 2003 服务器 SP2，
performance - Chrome Performance Profiler 中的“Timings”选项卡丢失
因此，我在 Chrome 开发者工具的性能选项卡内的时间部分成功地监控了我的 React Native 应用程序的性能。突然在应用程序的特定重新加载时，Timings 标签丢失。我已尝试重置

首页

博学

6Ren·AI

商城

c++ - R 和 Simmer : Performance boost on large data frames