gpt4 book ai didi

Java 垃圾收集器 G1GC 花费很长时间等待 'Object Copy'(疏散暂停)

转载 作者:塔克拉玛干 更新时间:2023-11-03 03:51:23 24 4
gpt4 key购买 nike

我不是 Java 新手,但我对垃圾回收知之甚少。现在我想通过一些实际经验来改变这种状况。我的目标是延迟时间低于 0.3 秒,或者在极端情况下 0.5 秒也可以。

我有一个带有 -Xmx50gb (-Xms50gb) 的应用程序并设置了以下其他 GC 选项:

-XX:+UseG1GC -Xloggc:somewhere.gc.log -XX:+PrintGCDateStamps

但现在我偶尔会因为垃圾收集而暂停超过 5 秒,尽管似乎有足够的可用内存。我发现的一个原因:

[GC pause (G1 Evacuation Pause) (young) 42G->40G(48G), 5.9409662 secs]

为什么 GCG1 还在为此做一个“停止世界”? (或者至少我看到它恰好在这个时候停止了我的应用程序)如果不是真的有必要,为什么它会进行这样的负面清理,因为有超过 12% 的可用 RAM 可用。我还认为 -XX:MaxGCPauseMillis 的默认值是 200 毫秒,为什么这个值被 29 倍甚至 50 倍(见下文)违反?

延迟的另一个原因是:

[GC pause (Metadata GC Threshold) (young) (initial-mark) 40G->39G(48G), 10.4667233 secs]

这可能会解决via this answer例如只是增加元数据空间 -XX:MetaspaceSize=100M

顺便说一句:使用 JSE 1.8.0_91-b14

更新:此类事件的详细 GC 日志

2016-08-12T09:20:31.589+0200: 1178.312: [GC pause (G1 Evacuation Pause) (young) 1178.312: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3159, predicted base time: 1.52 ms, remaining time: 198.48 ms, target pause time: 200.00 ms]
1178.312: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 136 regions, survivors: 20 regions, predicted young region time: 1924.75 ms]
1178.312: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 136 regions, survivors: 20 regions, old: 0 regions, predicted pause time: 1926.27 ms, target pause time: 200.00 ms]
1185.330: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 21.83 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)]
1185.330: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 42580574208 bytes, allocation request: 0 bytes, threshold: 23592960000 bytes (45.00 %), source: end of GC]
1185.330: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 1 regions, reclaimable: 3381416 bytes (0.01 %), threshold: 5.00 %]
, 7.0181903 secs]
[Parallel Time: 6991.8 ms, GC Workers: 10]
[GC Worker Start (ms): Min: 1178312.6, Avg: 1178312.8, Max: 1178312.9, Diff: 0.2]
[Ext Root Scanning (ms): Min: 1.1, Avg: 1.5, Max: 2.3, Diff: 1.2, Sum: 15.0]
[Update RS (ms): Min: 0.0, Avg: 0.3, Max: 1.3, Diff: 1.3, Sum: 3.4]
[Processed Buffers: Min: 0, Avg: 2.1, Max: 5, Diff: 5, Sum: 21]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum: 1.7]
[Object Copy (ms): Min: 6964.1, Avg: 6973.0, Max: 6989.5, Diff: 25.3, Sum: 69730.4]
[Termination (ms): Min: 0.0, Avg: 16.4, Max: 25.3, Diff: 25.3, Sum: 164.4]
[Termination Attempts: Min: 1, Avg: 3.2, Max: 13, Diff: 12, Sum: 32]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
[GC Worker Total (ms): Min: 6991.5, Avg: 6991.6, Max: 6991.7, Diff: 0.2, Sum: 69915.5]
[GC Worker End (ms): Min: 1185304.3, Avg: 1185304.3, Max: 1185304.3, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.3 ms]
[Other: 26.0 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 25.3 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.2 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.2 ms]
[Eden: 2176.0M(2176.0M)->0.0B(2176.0M) Survivors: 320.0M->320.0M Heap: 40.6G(48.8G)->40.0G(48.8G)]
[Times: user=0.55 sys=46.58, real=7.02 secs]

阅读here关于它:复制(停止世界事件)——这些是停止世界暂停以疏散或复制 Activity 对象到新的未使用区域。这可以通过记录为 [GC pause (young)] 的年轻代区域来完成。或者记录为 [GC 暂停(混合)] 的年轻代和老年代区域。

最佳答案

Why is GCG1 still doing a "stop the world" for this?

因为 G1 不是无暂停收集器,它只是一个低暂停收集器。

Also I thought that the default value for -XX:MaxGCPauseMillis is 200 milliseconds, why is this value violated by a factor of 29 or even 50 (see below)?

是的,但这只是一个目标,而不是保证。许多事情都可能导致它无法实现该目标。您有一个相当大的堆,这使事情变得更加困难,即更容易引发故障。

无论如何,GC 调优之旅从启用详细的 GC 日志记录开始

-Xloggc:<path to gc log file>
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails

更新:这些选项适用于热点 8. 9 及更高版本使用 unified logging它具有不同的参数格式。

然后通过 GCViewer 运行生成的日志获得一般概述,然后返回到 reading individual log entries (关于这个主题有很多答案/博客文章)找出可能导致最糟糕行为的原因。可以根据原因尝试各种补救措施。

对跟踪垃圾收集器的一般工作原理和 G1 有一些一般性的了解对于避免 cargo-culting 是必要的。

My application has many allocations which could be easily called "humongous allocations".

如果这确实是原因,那么当前的虚拟机有一些 experimental options尽快回收它们。

 [Object Copy (ms): Min: 6964.1, Avg: 6973.0, Max: 6989.5, Diff: 25.3, Sum: 69730.4]
[Times: user=0.55 sys=46.58, real=7.02 secs]

这意味着它大部分时间都在内核中执行一些应该主要由内存访问而不是系统调用组成的事情。所以交换 Activity 或transparent huge pages很可能是嫌疑犯。

关于Java 垃圾收集器 G1GC 花费很长时间等待 'Object Copy'(疏散暂停),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38905739/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com