gpt4 book ai didi

c++ - 强制 OpenMP 不在每个线程中缓存大对象

转载 作者:行者123 更新时间:2023-12-01 14:53:49 25 4
gpt4 key购买 nike

我正在编写一个带有循环的 C++ 程序,我试图使用 OpenMP 对其进行并行化。我正在编写的循环具有以下结构:

#pragma omp parallel for
for (int i = 0; i < N; i++)
result[i] = work_func(left[i], right[i], largeObject);
largeObject参数被标记为常量引用。我的问题是,当我从单个线程转移到多个线程(~40)时,内存使用量会急剧增加。 leftright参数都很小,这意味着即使将它们完全复制到所有线程也不会导致内存增加。

我想告诉 OpenMP 不要复制 largeObject到所有线程本地缓存,而不是强制它使用单个全局拷贝。有没有办法做到这一点?这似乎与 OpenMP 性能问题更常见的无错误共享优化背道而驰。我不太关心运行时减速,而不是这个程序的大量内存开销。

谢谢!

最佳答案

int const largeObject -declaration 指令应提倡编译阶段以避免任何 分享 主要是不可变对象(immutable对象)的附加机制和/或同步策略(因为没有出现任何需要访问声明的不可变 const largeObject 的写入尝试的竞争条件。正如@Gilles 所提到的,使用 volatile 指令使用另一种编译器注入(inject)的值访问策略机制,该机制与 OpenMP 没有直接关系,但受到相应的 omp -section(s) 的尊重。

#include <iostream>                                                                
#include <omp.h>

#define anIndeedLargeSIZE 2
int main()
{
int largeObject[anIndeedLargeSIZE] = {0};
#pragma omp const largeObject

std::cout << "largeObject address " << largeObject << std::endl;

#pragma omp parallel for num_threads(2)
for (int i = 0; i < 2; i++)
{
int tid = omp_get_thread_num();

std::cout << "tid: " << tid << " :: " << largeObject << std::endl;

if (i == tid)
{
// largeObject[i] = tid; // const .... un-mutable mode
std::cout << "tid: " << tid << " :: now reading and using a const largeObject[" << (int)i << "] == " << largeObject[i] << std::endl;
}

}

std::cout << "largeObject processing FINISHED." << std::endl;

return 0;
}

请测试确实大尺寸的内存分配副作用,IDE 原型(prototype)代码在 Godbolt site IDE 上测试是不公平的。 (用于进一步实验和扩展分析的完整 MCVE 代码与使用的编译器选项一起出现 there)正如 OpenMP API 文档警告的那样, 实际行为是“特定于实现” .

enter image description here
(base) Wed Jan 08 00:00:00 @64FX:~/$ g++ -o largeObject_const_OMP -O3 -fopenmp largeObject_const_OMP.c
largeObject_const_OMP.c: In function ‘int main()’:
largeObject_const_OMP.c:65:30: error: expected ‘#pragma omp’ clause before ‘const’
#pragma omp parallel for const (largeObject) num_threads(2)

<--------------------code-revised-as-desired-by-parsing-error:65:30:expected ‘#pragma omp’ clause ADDED before ‘const’-->
(base) Wed Jan 08 00:00:00 @64FX:~/$ g++ -o largeObject_const_OMP -O3 -fopenmp largeObject_const_OMP.c
<--------------------no-error-message|warning-from-parse|compile|link-phases-HERE->
(base) Wed Jan 08 00:00:00 @64FX:~/$ ./largeObject_const_OMP
largeObject address 0x7fff81b97d58
tid: tid: 0 :: 10x7fff81b97d58 ::
tid: 0 :: now reading and using a const largeObject[0] == 0
0x7fff81b97d58
tid: 1 :: now reading and using a const largeObject[1] == 0
largeObject processing FINISHED.

访问 int const v/s int volatile largeObject :
#include <iostream>                                                  // >>> https://gcc.godbolt.org/z/NRQSQ_
#include <omp.h> // >>> https://stackoverflow.com/questions/59637163/force-openmp-to-not-cache-a-large-object-in-each-thread/59638455?noredirect=1#comment105445758_59638455

#include <chrono>
#include <thread>

#define anIndeedLargeSIZE 2
int main()
{
//
int const largeObject[anIndeedLargeSIZE] = {0};
// #pragma omp const largeObject // largeObject_const_OMP.c:46:0: warning: ignoring #pragma omp const [-Wunknown-pragmas]
std::cout << "int const largeObject address[_" << largeObject << "_]" << std::endl;

#pragma omp parallel for num_threads(2)
for (int i = 0; i < 2; i++)
{
int tid = omp_get_thread_num();

std::this_thread::sleep_for( std::chrono::milliseconds( 100 * tid ) );

std::cout << "tid: " << (int)tid << " ::[_" << largeObject << "_]" << std::endl;

if (i == tid)
{
// largeObject[i] = tid; // const .... un-mutable mode
std::cout << "tid: " << (int)tid << " :: now reading and using an int const largeObject[" << (int)i << "] == " << largeObject[i] << std::endl;
}

}

std::cout << "int const largeObject[] processing FINISHED." << std::endl;
/*////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
>>> ~/$ ./largeObject_const_OMP
* int const largeObject address[_0x7fff3ed0db28_]
* tid: 0 ::[_0x7fff3ed0db28_]
* tid: 0 :: now reading and using an int const largeObject[0] == 0
* tid: 1 ::[_0x7fff3ed0db28_]
* tid: 1 :: now reading and using an int const largeObject[1] == 0
* int const largeObject[] processing FINISHED.
*
* */

/*
int volatile largeObject[anIndeedLargeSIZE] = {0};
std::cout << "int volatile largeObject address[_" << largeObject << "_]" << std::endl;

#pragma omp parallel for num_threads(2)
for (int i = 0; i < 2; i++)
{
int tid = omp_get_thread_num();

std::this_thread::sleep_for( std::chrono::milliseconds( 100 * tid ) );

std::cout << "tid: " << (int)tid << " ::[_" << largeObject << "_]" << std::endl;

if (i == tid)
{
// largeObject[i] = tid; // const .... un-mutable mode
std::cout << "tid: " << (int)tid << " :: now reading and using an int volatile largeObject[" << (int)i << "] == " << largeObject[i] << std::endl;
}

}

std::cout << "int volatile largeObject[] processing FINISHED." << std::endl;
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
>>> ~/$ ./largeObject_const_OMP
* int volatile largeObject address[_1_]
* tid: 0 ::[_1_]
* tid: 0 :: now reading and using an int volatile largeObject[0] == 0
* tid: 1 ::[_1_]
* tid: 1 :: now reading and using an int volatile largeObject[1] == 0
* int volatile largeObject[] processing FINISHED.
* */
return 0;
}

关于c++ - 强制 OpenMP 不在每个线程中缓存大对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59637163/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com