gpt4 book ai didi

c++ - C# TPL 比 C++ PPL 更快?

转载 作者:塔克拉玛干 更新时间:2023-11-02 23:11:44 24 4
gpt4 key购买 nike

我编写了一个非常简单的应用程序,它使用 Fibonacci 函数来比较 TPL 的 Parallel.ForEach 与 PPL 的 parallel_for_each,结果真的很奇怪,在 pc 上有 8内核,c# 比 c++ 快 11 秒

vs2010 和 vs 2011 预览版的结果相同。

C#代码:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections.Concurrent;
using System.Threading.Tasks;
using System.Diagnostics;

namespace ConsoleApplication1
{
class Program
{

static void Main(string[] args)
{
var ll = new ConcurrentQueue<Tuple<int, int>>();
var a = new int[12] { 40, 41, 42, 43, 44, 45, 46, 47, 35, 25, 36, 37 };

long elapsed = time_call(() =>
{
Parallel.ForEach(a, (n) => { ll.Enqueue(new Tuple<int, int>(n, fibonacci(n))); });
});

Console.WriteLine("TPL C# elapsed time: " + elapsed + "\n\r");
foreach (var ss in ll)
{
Console.WriteLine(String.Format("fib<{0}>: {1}", ss.Item1, +ss.Item2));
}

Console.ReadLine();
}

static long time_call(Action f)
{
var p = Stopwatch.StartNew();
p.Start();
f();
p.Stop();
return p.ElapsedMilliseconds;
}

Computes the nth Fibonacci number.
static int fibonacci(int n)
{
if (n < 2) return n;
return fibonacci(n - 1) + fibonacci(n - 2);
}
}
}

C++代码:

#include <windows.h>
#include <ppl.h>
#include <concurrent_vector.h>
#include <array>
#include <tuple>
#include <algorithm>
#include <iostream>

using namespace Concurrency;
using namespace std;

template <class Function>
__int64 time_call(Function&& f) {
__int64 begin = GetTickCount();
f();
return GetTickCount() - begin;
}

// Computes the nth Fibonacci number.
int fibonacci(int n) {
if (n < 2) return n;
return fibonacci(n-1) + fibonacci(n-2);
}

int wmain() {
__int64 elapsed;
array<int, 12> a ={ 40, 41, 42, 43, 44, 45, 46, 47, 35, 25, 36, 37 };
concurrent_vector<tuple<int,int>> results2;

elapsed = time_call([&]{
parallel_for_each(a.begin(), a.end(), [&](int n) {
results2.push_back(make_tuple(n, fibonacci(n)));
});
});

wcout << L"PPL time: " << elapsed << L" ms" << endl << endl;
for_each (results2.begin(), results2.end(), [](tuple<int,int>& pair) {
wcout << L"fib(" << get<0>(pair) << L"): " << get<1>(pair) << endl;
});

cin.ignore();
}

你能指出我的 C++ 代码哪里错了吗?

width group_task i have the same time like c# code:

task_group tasks;
elapsed = time_call([&]
{
for_each(begin(a), end(a), [&](int n)
{
tasks.run([&,n]{results2.push_back(make_tuple(n, fibonacci(n)));});
});
tasks.wait();

最佳答案

这是 Rahul v Patil Microsoft 团队的解释

Hello,

Thanks for bringing this up. Indeed, you've identified the overhead associated with the default parallel for * - especially when the number of iterations are small, and the work size is variable. The default parallel for starts off by breaking down the work into 8 chunks (on 8 cores). As the work finishes, work is dynamically load-balanced. The default works great in most cases (large number of iterations), and when the underlying work per iteration is not well understood (let's say you call into a library) - but it does come with unacceptable overheads in some cases.

The solution is exactly what you've identified in your alternate implemtnation. To that effect, we'll have a parallel for partitioner called "simple" in next version of Visual Studio, which will be similar to the alternate implementation you describe and will have much better performance.

PS: The C# and C++ parallel for each implementations use slightly different algorithms in how they go through the iterations - hence you will see slightly different performance characteristics depending on the workload.

Regards

关于c++ - C# TPL 比 C++ PPL 更快?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8712242/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com