delphi - TParallel 的奇怪行为。对于默认线程池-6ren

delphi - TParallel 的奇怪行为。对于默认线程池

转载作者：行者123 更新时间：2023-12-03 14:35:16

我正在尝试 Delphi XE7 Update 1 的并行编程功能。

我创建了一个简单的 TParallel.For 循环，它基本上执行一些虚假操作来打发时间。

我在 AWS 实例 (c4.8xlarge) 的 36 vCPU 上启动了该程序，尝试看看并行编程可以带来什么好处。

当我第一次启动程序并执行 TParallel.For 循环时，我看到了显着的增益(尽管无可否认，比我对 36 个 vCPU 的预期要少得多):

Parallel matches: 23077072 in 242ms
Single Threaded matches: 23077072 in 2314ms

如果我不关闭程序并不久后(例如，立即或大约 10-20 秒后)在 36 vCPU 计算机上再次运行该过程，则并行过程会恶化很多:

Parallel matches: 23077169 in 2322ms
Single Threaded matches: 23077169 in 2316ms

如果我不关闭程序，并等待几分钟(不是几秒钟，而是几分钟)，然后再次运行该过程，我会再次得到第一次启动该程序时得到的结果(提高了 10 倍)响应时间)。

启动程序后的第一遍在 36 vCPU 计算机上总是更快，因此这种效果似乎仅在程序中第二次调用 TParallel.For 时发生。

这是我正在运行的示例代码:

unit ParallelTests;

interface

uses
  Winapi.Windows, Winapi.Messages, System.SysUtils, System.Variants, System.Classes, Vcl.Graphics,
  System.Threading, System.SyncObjs, System.Diagnostics,
  Vcl.Controls, Vcl.Forms, Vcl.Dialogs, Vcl.StdCtrls;

type
  TForm1 = class(TForm)
    Button1: TButton;
    Memo1: TMemo;
    SingleThreadCheckBox: TCheckBox;
    ParallelCheckBox: TCheckBox;
    UnitsEdit: TEdit;
    Label1: TLabel;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.Button1Click(Sender: TObject);
var
  matches: integer;
  i,j: integer;
  sw: TStopWatch;
  maxItems: integer;
  referenceStr: string;

 begin
  sw := TStopWatch.Create;

  maxItems := 5000;

  Randomize;
  SetLength(referenceStr,120000); for i := 1 to 120000 do referenceStr[i] := Chr(Ord('a') + Random(26)); 

  if ParallelCheckBox.Checked then begin
    matches := 0;
    sw.Reset;
    sw.Start;
    TParallel.For(1, MaxItems,
      procedure (Value: Integer)
        var
          index: integer;
          found: integer;
        begin
          found := 0;
          for index := 1 to length(referenceStr) do begin
            if (((Value mod 26) + ord('a')) = ord(referenceStr[index])) then begin
              inc(found);
            end;
          end;
          TInterlocked.Add(matches, found);
        end);
    sw.Stop;
    Memo1.Lines.Add('Parallel matches: ' + IntToStr(matches) + ' in ' + IntToStr(sw.ElapsedMilliseconds) + 'ms');
  end;

  if SingleThreadCheckBox.Checked then begin
    matches := 0;
    sw.Reset;
    sw.Start;
    for i := 1 to MaxItems do begin
      for j := 1 to length(referenceStr) do begin
        if (((i mod 26) + ord('a')) = ord(referenceStr[j])) then begin
          inc(matches);
        end;
      end;
    end;
    sw.Stop;
    Memo1.Lines.Add('Single Threaded matches: ' + IntToStr(Matches) + ' in ' + IntToStr(sw.ElapsedMilliseconds) + 'ms');
  end;
end;

end.

这是否按设计工作？我发现这篇文章( http://delphiaball.co.uk/tag/parallel-programming/ )建议我让库决定线程池，但如果我必须从一个请求到另一个请求等待几分钟以便更快地处理请求，那么我不认为使用并行编程有什么意义。

我是否遗漏了关于如何使用 TParallel.For 循环的任何内容？

请注意，我无法在 AWS m3.large 实例(根据 AWS 的说法是 2 个 vCPU)上重现此情况。在这种情况下，我总是会得到轻微的改进，并且在不久之后的 TParallel.For 后续调用中也不会得到更糟糕的结果。

Parallel matches: 23077054 in 2057ms
Single Threaded matches: 23077054 in 2900ms

因此，似乎当有许多可用核心(36 个)时，就会出现这种效果，这很遗憾，因为并行编程的全部要点就是从许多核心中受益。我想知道这是否是一个库错误，因为内核数量较多，或者在这种情况下内核数量不是 2 的幂。

UPDATE: After testing it with various instances of different vCPU counts in AWS, this seems to be the behaviour:

36 vCPUs (c4.8xlarge). You have to wait minutes between subsequent calls to a vanilla TParallel call (it makes it unusable for production)

32 vCPUs (c3.8xlarge). You have to wait minutes between subsequent calls to a vanilla TParallel call (it makes it unusable for production)

16 vCPUs (c3.4xlarge). You have to wait sub second times. It could be usable if load is low but response time still important

8 vCPUs (c3.2xlarge). It seems to work normally

4 vCPUs (c3.xlarge). It seems to work normally

2 vCPUs (m3.large). It seems to work normally

最佳答案

我根据您的程序创建了两个测试程序来比较 System.Threading 和 OTL 。我使用 XE7 update 1 和 OTL r1397 构建。我使用的 OTL 源对应版本 3.04。我使用 32 位 Windows 编译器并使用发布构建选项进行构建。

我的测试机器是运行 Windows 7 x64 的双 Intel Xeon E5530。该系统有两个四核处理器。总共有 8 个处理器，但由于超线程，系统显示有 16 个。经验告诉我，超线程只是营销废话，我从未见过在这台机器上扩展超过 8 倍。

现在来说两个程序，它们几乎相同。

系统线程

program SystemThreadingTest;

{$APPTYPE CONSOLE}

uses
  System.Diagnostics,
  System.Threading;

const
  maxItems = 5000;
  DataSize = 100000;

procedure DoTest;
var
  matches: integer;
  i, j: integer;
  sw: TStopWatch;
  referenceStr: string;
begin
  Randomize;
  SetLength(referenceStr, DataSize);
  for i := low(referenceStr) to high(referenceStr) do
    referenceStr[i] := Chr(Ord('a') + Random(26));

  // parallel
  matches := 0;
  sw := TStopWatch.StartNew;
  TParallel.For(1, maxItems,
    procedure(Value: integer)
    var
      index: integer;
      found: integer;
    begin
      found := 0;
      for index := low(referenceStr) to high(referenceStr) do
        if (((Value mod 26) + Ord('a')) = Ord(referenceStr[index])) then
          inc(found);
      AtomicIncrement(matches, found);
    end);
  Writeln('Parallel matches: ', matches, ' in ', sw.ElapsedMilliseconds, 'ms');

  // serial
  matches := 0;
  sw := TStopWatch.StartNew;
  for i := 1 to maxItems do
    for j := low(referenceStr) to high(referenceStr) do
      if (((i mod 26) + Ord('a')) = Ord(referenceStr[j])) then
        inc(matches);
  Writeln('Serial matches: ', matches, ' in ', sw.ElapsedMilliseconds, 'ms');
end;

begin
  while True do
    DoTest;
end.

OTL

program OTLTest;

{$APPTYPE CONSOLE}

uses
  Winapi.Windows,
  Winapi.Messages,
  System.Diagnostics,
  OtlParallel;

const
  maxItems = 5000;
  DataSize = 100000;

procedure ProcessThreadMessages;
var
  msg: TMsg;
begin
  while PeekMessage(Msg, 0, 0, 0, PM_REMOVE) and (Msg.Message <> WM_QUIT) do begin
    TranslateMessage(Msg);
    DispatchMessage(Msg);
  end;
end;

procedure DoTest;
var
  matches: integer;
  i, j: integer;
  sw: TStopWatch;
  referenceStr: string;
begin
  Randomize;
  SetLength(referenceStr, DataSize);
  for i := low(referenceStr) to high(referenceStr) do
    referenceStr[i] := Chr(Ord('a') + Random(26));

  // parallel
  matches := 0;
  sw := TStopWatch.StartNew;
  Parallel.For(1, maxItems).Execute(
    procedure(Value: integer)
    var
      index: integer;
      found: integer;
    begin
      found := 0;
      for index := low(referenceStr) to high(referenceStr) do
        if (((Value mod 26) + Ord('a')) = Ord(referenceStr[index])) then
          inc(found);
      AtomicIncrement(matches, found);
    end);
  Writeln('Parallel matches: ', matches, ' in ', sw.ElapsedMilliseconds, 'ms');

  ProcessThreadMessages;

  // serial
  matches := 0;
  sw := TStopWatch.StartNew;
  for i := 1 to maxItems do
    for j := low(referenceStr) to high(referenceStr) do
      if (((i mod 26) + Ord('a')) = Ord(referenceStr[j])) then
        inc(matches);
  Writeln('Serial matches: ', matches, ' in ', sw.ElapsedMilliseconds, 'ms');
end;

begin
  while True do
    DoTest;
end.

现在是输出。

System.Threading 输出

Parallel matches: 19230817 in 374msSerial matches: 19230817 in 2423msParallel matches: 19230698 in 374msSerial matches: 19230698 in 2409msParallel matches: 19230556 in 368msSerial matches: 19230556 in 2433msParallel matches: 19230635 in 2412msSerial matches: 19230635 in 2430msParallel matches: 19230843 in 2441msSerial matches: 19230843 in 2413msParallel matches: 19230905 in 2493msSerial matches: 19230905 in 2423msParallel matches: 19231032 in 2430msSerial matches: 19231032 in 2443msParallel matches: 19230669 in 2440msSerial matches: 19230669 in 2473msParallel matches: 19230811 in 2404msSerial matches: 19230811 in 2432ms....

OTL输出

Parallel matches: 19230667 in 422msSerial matches: 19230667 in 2475msParallel matches: 19230663 in 335msSerial matches: 19230663 in 2438msParallel matches: 19230889 in 395msSerial matches: 19230889 in 2461msParallel matches: 19230874 in 391msSerial matches: 19230874 in 2441msParallel matches: 19230617 in 385msSerial matches: 19230617 in 2524msParallel matches: 19231021 in 368msSerial matches: 19231021 in 2455msParallel matches: 19230904 in 357msSerial matches: 19230904 in 2537msParallel matches: 19230568 in 373msSerial matches: 19230568 in 2456msParallel matches: 19230758 in 333msSerial matches: 19230758 in 2710msParallel matches: 19230580 in 371msSerial matches: 19230580 in 2532msParallel matches: 19230534 in 336msSerial matches: 19230534 in 2436msParallel matches: 19230879 in 368msSerial matches: 19230879 in 2419msParallel matches: 19230651 in 409msSerial matches: 19230651 in 2598msParallel matches: 19230461 in 357ms....

我让 OTL 版本运行了很长一段时间，并且模式从未改变。并行版本始终比串行版本快 7 倍左右。

结论

代码非常简单。可以得出的唯一合理的结论是 System.Threading 的实现存在缺陷。

有大量与新的 System.Threading 库相关的错误报告。种种迹象表明，它的质量很差。 Embarcadero 在发布不符合标准的库代码方面有着悠久的记录。我正在考虑 TMonitor、XE3 字符串助手、早期版本的 System.IOUtils、FireMonkey。这样的例子还在继续。

很明显，质量是 Embarcadero 的一个大问题。发布的代码显然没有经过充分的测试(如果有的话)。这对于线程库来说尤其麻烦，因为错误可能处于休眠状态，并且仅在特定的硬件/软件配置中暴露。 TMonitor 的经验让我相信 Embarcadero 没有足够的专业知识来生成高质量、正确的线程代码。

我的建议是您不应该使用当前形式的System.Threading。在可以看出它具有足够的质量和正确性之前，应该避免它。我建议您使用OTL。

<小时/>

编辑:该程序的原始 OTL 版本存在实时内存泄漏，这是由于丑陋的实现细节而发生的。 Parallel.For 使用 .Unobserved 修饰符创建任务。这导致所述任务仅在某些内部消息窗口收到“任务已终止”消息时才被销毁。该窗口是在与 Parallel.For 调用者相同的线程中创建的 - 即在本例中是在主线程中。由于主线程不处理消息，任务永远不会被销毁，内存消耗(加上其他资源)只会堆积起来。可能是因为该程序在一段时间后挂起。

关于delphi - TParallel 的奇怪行为。对于默认线程池，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29062697/

文章推荐： delphi - XPath 和 TXmlDocument

文章推荐： delphi - 货币值(value)/恒定值比较的奇怪结果

javascript - 谷歌地图自动完成弹回已经清除的文本......奇怪......奇怪......奇怪
我有这种来自 Google map 自动完成的奇怪行为(或者我可能错过了某事)...想法？奇怪的: 您在输入中输入某物，例如“伦敦” 您按 [ENTER] 你按下 [CLEAR] 按钮你点击进入'输
Java意外类型做字符串比较，奇怪
这段代码与《Learning Java》(Oracle Press Books)一书中的代码完全一样，但它不起作用。我不明白为什么它不起作用，它应该起作用。我用 OpenJDK 和 Sun JDK 7
Powershell 对新行使用反引号 - 奇怪
示例 1 中究竟发生了什么？这是如何解析的？ # doesnt split on , [String]::Join(",",("aaaaa,aaaaa,aaaaa,aaaaa,aaaaa,aa
iphone - 指针类型不兼容？？奇怪
我需要获得方程式系统的解决方案。为此，我使用函数sgesv_()。一切都很好，它使我感到解决方案的正确结果。但是我得到一个奇怪的警告。警告:从不兼容的指针类型传递'sgesv_'的参数3 我正在
ios - 奇怪!动画完成后是否一直调用函数？
我目前在制作动画时遇到一个奇怪的问题: [UIView animateWithDuration:3 delay:0
jQuery 不工作 - 奇怪
alert('works'); $(window).load(function () { alert('does not work'); });
java - 静态内部类 - 奇怪
我的代码: public class MyTest { public class StringSorter implements Comparator { public
JavaScript 对象更新行为(奇怪？)
我正在学习 JavaScript。尝试理解代码， function foo (){ var a = b = {name: 'Hai'}; document.write(a.name +''
c++ - GetLastError() != 奇怪
这个问题不太可能帮助任何 future 的访问者；它只与一个小的地理区域、一个特定的时间点或一个非常狭窄的情况有关，这些情况并不普遍适用于互联网的全局受众。为了帮助使这个问题更广泛地适用，visit
Linux 环境 -i 奇怪
这按预期工作: [dgorur@ted ~]$ env -i env [dgorur@ted ~]$ 这样做: [dgorur@ted ~]$ env -i which date which: no
c++ - 指针增量 - 奇怪
struct BLA { int size_; int size()const{ return size_; } } int x; BLA b[ 2 ]; BLA * p = &b[
css - 图像垂直对齐与 css - 奇怪
我有以下代码: #test img {vertical-align: middle;} div#test { border: 1px solid green; height: 150px; li
gcc - (奇怪？)GCC 预处理器行为
我想大多数使用过 C/C++ 的人都对预处理器的工作原理有一定的直觉(或多或少)。直到今天我也是这么认为的，但事实证明我的直觉是错误的。故事是这样的: 今天我尝试了一些东西，但我无法解释结果。首先考虑
OCMock只生效一次，奇怪，为什么？或者我这边有什么问题？
我想为 TnSettings 做 mock，是的，如果通过以下方法编写代码，它就可以工作，问题是我们需要为每个案例编写 mock 代码，如果我们只 mock 一次然后执行多个案例，那么第二个将报告异常
c - 溢出？找不到来源，奇怪
我的项目中有以下两个结构 typedef volatile struct { unsigned char rx_buf[MAX_UART_BUF]; //Input buffer over U
c# - 奇怪，regex.split方法匹配一个null元素
Regex rx = new Regex(@"[+-]"); string[] substrings = rx.Split(expression); expression = "-9a3dcb
java - JDBC 连接被对等方关闭(奇怪)
我的两个应用程序遇到了一个奇怪的问题。这是设置: 两个 tomcat/java 应用程序，在同一个网络中运行，连接到相同的 MS-SQL-Server。一个应用程序，恰好按顺序位于 DMZ 中可从互联
Android OnLongClickListener 奇怪/不可靠的行为
我目前正在与 Android Api Lvl 8 上的 OnLongClickListener 作斗争。拿这段代码: this.webView.setOnLongClickListener(new
java - JUnit 奇怪 - 我未指定预期数量
这个问题不太可能帮助任何 future 的访问者；它只与一个小的地理区域、一个特定的时间点或一个非常狭窄的情况相关，这些情况并不普遍适用于互联网的全局受众。为了帮助使这个问题更广泛地适用，visit
objective-c - performSelectorOnMainThread 奇怪
只是遇到了奇怪的事情。我有以下代码: -(void)ImageDownloadCompleat { [self performSelectorOnMainThread:@selector(up

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

delphi - TParallel 的奇怪行为。对于默认线程池