This is more a conceptual question. I was wondering if I used a lock
inside of Parallel.ForEach
loop if that would take away the benefits of parallelizing a foreach
loop.
这更多的是一个概念性问题。我想知道我是否在Parall.ForEach循环中使用了锁,这是否会剥夺并行化Foreach循环的好处。
Here is some sample code where I have seen it done.
以下是我所见过的一些示例代码。
Parallel.ForEach<KeyValuePair<string, XElement>>(binReferences.KeyValuePairs, reference =>
{
lock (fileLockObject)
{
if (fileLocks.ContainsKey(reference.Key) == false)
{
fileLocks.Add(reference.Key, new object());
}
}
RecursiveBinUpdate(reference.Value, testPath, reference.Key, maxRecursionCount,
ref recursionCount);
lock (fileLocks[reference.Key])
{
reference.Value.Document.Save(reference.Key);
}
});
Where fileLockObject
and fileLocks
are as follows.
其中,文件锁对象和文件锁如下所示。
private static object fileLockObject = new object();
private static Dictionary<string, object> fileLocks = new Dictionary<string, object>();
Does this technique completely make the loop not parallel?
I would like to see your thoughts on this.
这项技术能让循环完全不平行吗?我想听听你对这件事的看法。
更多回答
It means all of the work inside of the lock
can't be done in parallel. This greatly harms the performance here, yes. Since the entire body is not all locked (and locked on the same object) there is still some parallelization here though. Whether the parallelization that you do get adds enough benefit to surpass the overhead that comes with managing the threads and synchronizing around the locks is something you really just need to test yourself with your specific data.
这意味着锁内的所有工作都不能并行完成。是的,这极大地损害了这里的表现。由于整个主体并不都是锁定的(并且锁定在同一个对象上),因此这里仍然存在一些并行化。您确实获得的并行化是否增加了足够的好处,以超过管理线程和围绕锁进行同步所带来的开销,这是您真正需要用特定数据来测试自己的事情。
That said, it looks like what you're doing (at least in the first locked block, which is the one I'd be more concerned with at every thread is locking on the same object) is locking access to a Dictionary
. You can instead use a ConcurrentDictionary
, which is specifically designed to be utilized from multiple threads, and will minimize the amount of synchronization that needs to be done.
也就是说,看起来您正在做的事情(至少在第一个锁定块中,这是我更关心的,因为每个线程都锁定在同一个对象上)是锁定对字典的访问。相反,您可以使用一个专门设计用于从多个线程使用的ConCurentDictionary,它将最大限度地减少需要完成的同步量。
if I used a lock ... if that would take away the benefits of Paralleling a foreachloop.
Proportionally. When RecursiveBinUpdate()
is a big chunk of work (and independent) then it will still pay off. The locking part could be a less than 1%, or 99%. Look up Amdahls law, that applies here.
成比例地。当RecursiveBinUpdate()是一大块工作(并且是独立的)时,它仍然会得到回报。锁定部分可以小于1%或99%。查查Amdahls定律,它适用于这里。
But worse, your code is not thread-safe. From your 2 operations on fileLocks
, only the first is actually inside a lock.
但更糟糕的是,您的代码不是线程安全的。从您对fileLock的2个操作中,只有第一个操作实际位于锁内。
lock (fileLockObject)
{
if (fileLocks.ContainsKey(reference.Key) == false)
{
...
}
}
and
和
lock (fileLocks[reference.Key]) // this access to fileLocks[] is not protected
change the 2nd part to:
将第二部分更改为:
lock (fileLockObject)
{
reference.Value.Document.Save(reference.Key);
}
and the use of ref recursionCount
as a parameter looks suspicious too. It might work with Interlocked.Increment though.
使用ref递归计数作为参数看起来也很可疑。它可能与Interlock一起工作。不过是增量。
When it comes to locks, there's no difference in the way PLINQ/TPL threads have to wait to gain access. So, in your case, it only makes the loop not parallel in those areas that you're locking and any work outside those locks is still going to execute in parallel (i.e. all the work in RecursiveBinUpdate
).
当涉及到锁时,PLINQ/TPL线程必须等待才能获得访问权限的方式没有区别。因此,在您的情况下,它只会使循环在您锁定的那些区域中不并行,并且那些锁之外的任何工作仍然将并行执行(即,RecursiveBinUpdate中的所有工作)。
Bottom line, I see nothing substantially wrong with what you're doing here.
归根结底,我认为你在这里的所作所为并没有什么实质性的错误。
The "locked" portion of the loop will end up running serially. If the RecursiveBinUpdate
function is the bulk of the work, there may be some gain, but it would be better if you could figure out how to handle the lock generation in advance.
循环的“锁定”部分将结束连续运行。如果RecursiveBinUpdate函数是大部分工作,可能会有一些收益,但如果您能提前弄清楚如何处理锁生成,那会更好。
There is nothing conceptually wrong with using the lock
statement in a Parallel.ForEach
loop. If done correctly, the effect of the lock
on the overall performance of the parallel operation can be totally negligible. An uncontended lock
is really cheap. It can be acquired and released in as little as 20 nanoseconds on a 2010-era computer, in other words in can be acquired and released 50,000,000 times per second (citation). On the other hand if the lock
is used incorrectly, it can completely serialize the body
delegate, negating all the benefits of parallelization. Or even worse, as in the code snippet presented in your question, it can be applied incorrectly and allow concurrent access on non-thread-safe state (fileLocks
), resulting in undefined behavior.
在Parall.ForEach循环中使用LOCK语句在概念上没有任何错误。如果操作正确,锁对并行操作整体性能的影响可以完全忽略不计。无争用锁真的很便宜。在2010年代的计算机上,它可以在短短20纳秒内被获取和释放,换句话说,它可以被每秒获取和释放50,000,000次(引文)。另一方面,如果锁使用不当,它可能会完全序列化Body委托,从而抵消并行化的所有好处。或者更糟糕的是,如问题中提供的代码片段所示,它可能被不正确地应用,并允许对非线程安全状态(文件锁)进行并发访问,从而导致未定义的行为。
The general advice is: do the least amount of work while holding the lock
, and release it at the earliest opportunity. Don't do work inside the lock
that can be done outside the lock
.
一般的建议是:在按住锁的同时做最少的工作,并在最早的机会释放它。不要在锁内做可以在锁外做的工作。
Regarding the specific problem that the code in the question is attempting to solve, which is to prevent parallel processing for reference
objects having the same Key
, locking on the Key
might not be the optimal strategy. That's because it blocks ThreadPool
threads, and also because it may result in reduced degree of parallelism. Your code does not specify the MaxDegreeOfParallelism
, implying unlimited parallelism, which in practice means that the effective degree of parallelism will be determined by the ThreadPool
availability, which is a bit chaotic¹. In case you change your mind and decide to specify the MaxDegreeOfParallelism
, and you also desire to maintain a constant degree of parallelism during the whole operation, the simple use of the lock
won't cut it. You want the parallelization devoted to doing actual work, not occupied by waits for a lock. You could look at this answer for a more sophisticated approach on this specific problem. That answer is about the asynchronous Parallel.ForEachAsync
, but adjusting it to the synchronous Parallel.ForEach
should be straightforward.
对于问题中的代码试图解决的特定问题,即防止对具有相同键的引用对象进行并行处理,锁定键可能不是最佳策略。这是因为它会阻塞ThreadPool线程,还因为它可能会降低并行度。您的代码没有指定MaxDegreeOfParallism,这意味着无限的并行度,这实际上意味着有效的并行度将由ThreadPool的可用性决定,这有点混乱?如果您改变主意并决定指定MaxDegreeOfParallism,并且还希望在整个操作期间保持恒定的并行度,那么简单地使用锁是不能解决问题的。您希望并行化专门用于执行实际工作,而不是被锁的等待占用。您可以查看此答案,以获得解决此特定问题的更复杂的方法。答案是关于异步Parall.ForEachAsync,但将其调整为同步Parall.ForEach应该很简单。
¹ My personal advice is to specify always the MaxDegreeOfParallelism
when using the Parallel
APIs.
?我个人的建议是,在使用并行API时,始终指定MaxDegreeOfParallism。
更多回答
Your change to the second lock
block is overly pessimistic. The intent of the code was so that only operations on the same file are synchronized, so that saving two different files can be done at the same time. Saving is a potentially very expensive operation, and so this is worth a bit of extra time/effort in managing the synchronization. Yes, the OP's code has a bug in it, and it should be fixed, but it can be solved by simply accessing the dictionary's value in the global lock instead (or, as my answer suggests, using a ConcurrentDictionary
to handle the lookup).
您对第二个锁块的更改过于悲观。代码的目的是只同步对同一文件的操作,以便可以同时保存两个不同的文件。保存是一个潜在的非常昂贵的操作,因此值得在管理同步方面花费一些额外的时间/精力。是的,OP的代码中有一个bug,它应该被修复,但是它可以通过简单地访问全局锁中字典的值来解决(或者,正如我的回答所建议的,使用ConcurrentDictionary来处理查找)。
You're probably right that the saving can be pulled out of the lock, I didn't look thaat deeply. But there is a bug and as long as it's not fixed, performance ain't gonna matter.
你可能是对的,省下的钱可以从锁里拿出来,我没有看得太深。但有一个错误,只要它不被修复,性能就无关紧要。
@Servy the second lock would contradict your answer because you said that this would lock on the same object.
@Servy第二个锁将与您的答案相矛盾,因为您说这将锁定同一对象。
@diaz994 I fail to see what has contradicted my answer. Henk is proposing not locking on different objects and instead choosing to only lock on a single object.
@diaz994我看不出与我的答案相矛盾的是什么。亨克建议不要锁定不同的对象,而是选择只锁定单个对象。
Thanks for the response Reed. What do you mean handle the lock generation in advance? Im not really following there.
谢谢你的回复,里德。提前处理锁生成是什么意思?我并不是真的跟上了。
@diaz994 In your case, the initial lock could be avoided by creating the keys before you parallelize. The save could also be done after processing.
@diaz994在您的例子中,可以通过在并行化之前创建密钥来避免初始锁。也可以在处理后进行保存。
But wouldn't that cause to iterate through the references twice? That would cause some extra overhead that way. @Reed Copsey
但这不会导致对引用进行两次迭代吗?那样会产生一些额外的开销。@里德·科普西
@diaz994 iteration is typically quite fast - you might be able to use PLINQ to turn it into a 2 phase select (first builds the "locks", second does the processing), then do ForAll to save at the end...
@diaz994迭代通常相当快--您可以使用PLINQ将其转换为两个阶段的SELECT(首先构建“锁”,然后进行处理),然后执行FORALL以在最后保存...
我是一名优秀的程序员,十分优秀!