gpt4 book ai didi

akka - 为什么akka-stream的Source.groupedWithin不考虑持续时间?

转载 作者:行者123 更新时间:2023-12-04 15:40:22 26 4
gpt4 key购买 nike

使用akka-streams 2.4.17 Scala API,我试图使用Source.groupedWithin(size, duration)并指定持续时间。根据the documentation和我在source code中看到的内容,如果超过了组大小或超时,则分组应该向下进行;以先到者为准。

当我以模糊模式(非异步)运行简单的工作流时,持续时间似乎没有任何效果。但是,当我在.async调用之前或之后放置groupedWithin时,超时有效。

不起作用版本

Source.fromIterator(() => aFiniteIterator)
.map(aLongOperation(_))
.groupedWithin(1000, 5.seconds) // keeps waiting beyond 5 seconds
.map(somethingWithGroup(_))
.runWith(Sink.fold(0)(_ + _))

工作版本
Source.fromIterator(() => aFiniteIterator)
.map(aLongOperation(_))
.async
.groupedWithin(1000, 5.seconds) // now respects 5 seconds without full batch
.map(somethingWithGroup(_))
.runWith(Sink.fold(0)(_ + _))

为什么是这样?非异步版本是否可能无法识别下游需求?还是其他东西在起作用?

更新-带输出的完整代码示例

对于那些想了解血腥细节的人,这是我正在运行的完整代码。上下文是我一直在尝试进行节流以避免OOM异常。
case class Foo(id: String, value: String)

object Main {
implicit val system = ActorSystem("akka-streams-oom")
implicit val materializer = ActorMaterializer()

def main(args: Array[String]): Unit = {
println("starting tests...")
val attempt = Try(forceOOM)

attempt match {
case Success(_) => println("all tests passed successfully")
case Failure(e) => println(s"exception: e.getMessage")
}

println("terminating system...")
system.terminate
println("system terminated")
println("done with tests...")
}

private def forceOOM: Unit = {
println("executing forceOOM...")
val sink = Sink.fold[Int, Int](0)(_ + _)

val future =
bigSource
.map(logEmit)
.via(slowSubscriber)
.runWith(sink)

val finalResult = Await.result(future, Duration.Inf)
println(s"forceOOM result: $finalResult")
}

private def bigSource = {
val largeIterator = () =>
Iterator
.from(0,1000000000)
.map(_ => generateLargeFoo)

Source.fromIterator(largeIterator)
}

private def slowSubscriber =
Flow[Foo]
.map { foo =>
println(s"allocating memory for ${foo.id} at ${time}")
Foo(foo.id, bloat)
}
.async // if i remove this, the 5 second window below doesn't seem to work
.groupedWithin(100, 5.seconds)
.map(foldFoos)

private def logEmit(x: Foo): Foo = {
println(s"emitting next record: ${x.id} at ${time}")
x
}

private def foldFoos(x: Seq[Foo]): Int = {
println(s"folding records at ${time}")
x.map(_.value.length).fold(0)(_ + _)
}

private def time: String = LocalDateTime.now.toLocalTime.toString

private def bloat: String = {
(0 to 10)
.map(_ => generateLargeFoo.value)
.fold("")(_ + _)
}

private def generateLargeFoo: Foo = {
Foo(java.util.UUID.randomUUID.toString, (0 to 1000000).mkString)
}
}

无异步输出(超出超时范围)
[info] emitting next record: 5016fea4-f076-45dd-b95b-1d24f71a25b4 at 09:34:25.826
[info] allocating memory for 5016fea4-f076-45dd-b95b-1d24f71a25b4 at 09:34:25.868
[info] emitting next record: ab6e298b-0152-4af5-b685-bb4ed6c5b9de at 09:34:27.572
[info] allocating memory for ab6e298b-0152-4af5-b685-bb4ed6c5b9de at 09:34:27.572
[info] emitting next record: 6f5c1b75-5aaf-44e6-ac62-a6074735c057 at 09:34:28.957
[info] allocating memory for 6f5c1b75-5aaf-44e6-ac62-a6074735c057 at 09:34:28.958
[info] emitting next record: 313ce2b5-f669-4c59-b2ec-eafdae85ded6 at 09:34:30.378
[info] allocating memory for 313ce2b5-f669-4c59-b2ec-eafdae85ded6 at 09:34:30.378
[info] emitting next record: 91a8a95b-b3cc-4e27-8d3f-3400fa9c7a9f at 09:34:31.802
[info] allocating memory for 91a8a95b-b3cc-4e27-8d3f-3400fa9c7a9f at 09:34:31.802
[info] emitting next record: 0220e75a-029b-4d35-8494-690bed6938aa at 09:34:33.173
[info] allocating memory for 0220e75a-029b-4d35-8494-690bed6938aa at 09:34:33.174
[info] emitting next record: faa16b80-cfb1-4ea4-b3ba-c1d270caf865 at 09:34:34.409
[info] allocating memory for faa16b80-cfb1-4ea4-b3ba-c1d270caf865 at 09:34:34.409
[info] emitting next record: 8956d710-ad55-4dee-b4f3-82b8cf313a85 at 09:34:35.656
[info] allocating memory for 8956d710-ad55-4dee-b4f3-82b8cf313a85 at 09:34:35.656
[info] emitting next record: 1b989c56-6580-44f0-b8d9-46d5241046cc at 09:34:36.944
[info] allocating memory for 1b989c56-6580-44f0-b8d9-46d5241046cc at 09:34:36.945
[info] emitting next record: 66a766c7-29e0-40ca-b997-54985aad75d6 at 09:34:38.272
[info] allocating memory for 66a766c7-29e0-40ca-b997-54985aad75d6 at 09:34:38.272
[info] emitting next record: b8d29dad-bd44-4843-936e-5eb5df3bb594 at 09:34:39.530
[info] allocating memory for b8d29dad-bd44-4843-936e-5eb5df3bb594 at 09:34:39.530
[info] emitting next record: 8c7999cf-7796-427e-a155-c28d7fc4a934 at 09:34:40.987
[info] allocating memory for 8c7999cf-7796-427e-a155-c28d7fc4a934 at 09:34:40.988
[info] emitting next record: eda79635-4559-4c92-a5b7-83bbfc2e85b2 at 09:34:42.382
[info] allocating memory for eda79635-4559-4c92-a5b7-83bbfc2e85b2 at 09:34:42.382
[info] emitting next record: 8fa5d744-70e8-4261-9c3f-427737233e13 at 09:34:43.593
[info] allocating memory for 8fa5d744-70e8-4261-9c3f-427737233e13 at 09:34:43.593
[info] emitting next record: cc621484-c70d-4092-8dc6-2e39acc1f0b3 at 09:34:44.983
[info] allocating memory for cc621484-c70d-4092-8dc6-2e39acc1f0b3 at 09:34:44.983
[info] emitting next record: fbc03c9c-1ea8-4d4d-9a80-13118324140d at 09:34:46.244
[info] allocating memory for fbc03c9c-1ea8-4d4d-9a80-13118324140d at 09:34:46.244
[info] emitting next record: 96374d33-e117-4f48-b3be-79b8cb1e0fda at 09:34:47.953
[info] allocating memory for 96374d33-e117-4f48-b3be-79b8cb1e0fda at 09:34:47.953
[info] emitting next record: 1c210d73-35d3-41b9-ade6-9310783589a3 at 09:34:49.303
[info] allocating memory for 1c210d73-35d3-41b9-ade6-9310783589a3 at 09:34:49.303
[info] emitting next record: 3872c382-17a9-484a-861c-6f66a0c7d0ca at 09:34:50.620
[info] allocating memory for 3872c382-17a9-484a-861c-6f66a0c7d0ca at 09:34:50.620
[info] emitting next record: c34ba954-a9ff-45d1-910c-316c6eb9c85d at 09:34:52.597
[info] allocating memory for c34ba954-a9ff-45d1-910c-316c6eb9c85d at 09:34:52.597
[info] emitting next record: 8e5f804e-5e75-4eac-937f-651d45e3745d at 09:34:54.145
[info] allocating memory for 8e5f804e-5e75-4eac-937f-651d45e3745d at 09:34:54.145
[info] emitting next record: 1caf82cc-7b41-4730-bcc1-ca61ee7780e0 at 09:34:56.454
[info] allocating memory for 1caf82cc-7b41-4730-bcc1-ca61ee7780e0 at 09:34:56.455
[info] emitting next record: 9364d386-408a-4b63-80b5-0ed34473ba45 at 09:34:58.706
[info] allocating memory for 9364d386-408a-4b63-80b5-0ed34473ba45 at 09:34:58.706
[info] emitting next record: c43baaba-961e-4877-9835-7eeee538f0af at 09:35:00.822
[info] allocating memory for c43baaba-961e-4877-9835-7eeee538f0af at 09:35:00.822
[info] #
[info] # java.lang.OutOfMemoryError: Java heap space
[info] # -XX:OnOutOfMemoryError="kill -9 %p"
[info] # Executing "kill -9 96871"...
java.lang.RuntimeException: Nonzero exit code returned from runner: 137
at scala.sys.package$.error(package.scala:27)

带异步输出(超时有效)
[info] emitting next record: 668d6f9f-43cc-45a6-99b3-d8e8ab2b9cae at 09:28:48.188
[info] allocating memory for 668d6f9f-43cc-45a6-99b3-d8e8ab2b9cae at 09:28:48.231
[info] emitting next record: 6c50b3e1-d3ec-422e-b41a-fe3d92df15a9 at 09:28:48.333
[info] emitting next record: 20b659f9-73e1-4c67-b251-2b224eec4d24 at 09:28:48.421
[info] emitting next record: 9af08f07-8246-498b-9f64-b56982cf3536 at 09:28:48.497
[info] emitting next record: 14cdf3b4-d14f-4953-8609-24c7a1996a12 at 09:28:48.569
[info] emitting next record: 571002f3-7301-4afa-8bc9-3fb8a9e84db2 at 09:28:48.665
[info] emitting next record: 5e88a51b-b56c-40fe-84a3-2fcf18b90e3f at 09:28:48.787
[info] emitting next record: e66b29f3-1690-4645-a048-19049e92303a at 09:28:48.846
[info] emitting next record: 66c16074-b200-4808-a990-13abadc66e43 at 09:28:48.943
[info] emitting next record: 1de8caca-fa48-4777-90a7-1449bd6722bb at 09:28:49.003
[info] emitting next record: bc3859b6-94ab-4262-b4cd-fa757e8f3f1f at 09:28:49.064
[info] emitting next record: 988216a7-5944-4aa5-98f6-b36542d8e7a8 at 09:28:49.172
[info] emitting next record: e6ab4ef6-1fd2-471b-8866-2f8422346df5 at 09:28:49.325
[info] emitting next record: c86b3116-70c8-453e-9ddf-bd8d9e144caf at 09:28:49.384
[info] emitting next record: 78c68185-cdd1-4fde-aa39-e03b37b5f449 at 09:28:49.603
[info] emitting next record: 7ed11952-ceba-47f5-9ba4-25d1e9dceea0 at 09:28:49.671
[info] allocating memory for 6c50b3e1-d3ec-422e-b41a-fe3d92df15a9 at 09:28:50.164
[info] allocating memory for 20b659f9-73e1-4c67-b251-2b224eec4d24 at 09:28:51.459
[info] allocating memory for 9af08f07-8246-498b-9f64-b56982cf3536 at 09:28:52.752
[info] folding records at 09:28:53.106
[info] allocating memory for 14cdf3b4-d14f-4953-8609-24c7a1996a12 at 09:28:53.969
[info] allocating memory for 571002f3-7301-4afa-8bc9-3fb8a9e84db2 at 09:28:55.234
[info] allocating memory for 5e88a51b-b56c-40fe-84a3-2fcf18b90e3f at 09:28:56.422
...

最佳答案

我怀疑您正在使用aLongOperation或其他一些阻止操作来模拟Thread.sleep
如果是这种情况,在不强制使用async边界的情况下,整个图形将共享相同的actor-从而共享相同的线程。阻塞该线程将导致基础调度基础设施匮乏(请参阅docs)。

尝试以非阻塞方式模拟您的长时间操作(例如,使用after模式)。

另请参见以下针对该主题提出的issue

关于akka - 为什么akka-stream的Source.groupedWithin不考虑持续时间?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42845166/

26 4 0