c++ - 表达式模板 : unroll loop-6ren

c++ - 表达式模板 : unroll loop

转载作者：行者123 更新时间：2023-11-28 01:43:05

26

4

我有同样的问题: Expression templates: improving performance in evaluating expressions?

我的目标是展开这个表达式的循环

auto && intermediate = A+D*C
for(int i= 0; i<10 ;i++)
    intermediate = intermediate + B
Vector result = intermediate * E

我想在整个二元表达式树的中间，最后类 Vector 的 operator=(Expression ) 运行图形检查使用我的代码，它只在没有循环的情况下工作(我使用表达式模板的经典实现，Joel Falcou @cppcon 2015 之一)

编辑:由于循环导致的代码编译问题

如果我取消对主循环的注释我有编译错误需要用c++11运行

g++ -std=c++11 -O3 -fopenmp -Wall -pedantic -pthread main.cpp && ./a.out

#include <vector>
#include <iostream>

template <typename TBase, typename Derived>
struct BaseExpression
{
   Derived const& self() const { return static_cast<const Derived&>(*this); }
   Derived & self() { return static_cast<Derived&>(*this); }
   TBase operator[](size_t szIdx) const { return self()[szIdx]; }
   size_t size() const {return self().size();}
};

template <typename TBase, typename Operator, typename OP1, typename OP2>
class Binary_Expression : public BaseExpression<TBase, Binary_Expression<TBase, Operator, OP1, OP2> >
{
public:
   Binary_Expression(OP1 const & a, OP2 const & b) : op1(a), op2(b){}
   TBase operator[] (size_t idx) const { return op(op1[idx], op2[idx]); }
   size_t size() const { return op1.size() != 0 ? op1.size() : op2.size(); }


private:
   const OP1 & op1;
   const OP2 & op2;
   Operator op;
};


template <typename TBase >
class Vector : public BaseExpression<TBase, Vector<TBase> >
{

public:
   explicit Vector(size_t szSizeN) : m_xMemory(szSizeN){}

   Vector(const Vector &orig): m_xMemory()
   { 
      this->copy(orig);
   }

   Vector & operator=(const Vector &orig)
   {
      if (&orig != this)
      {
         Vector temp(orig);
         this->swap(temp);
      }

      return *this;
   }

   Vector & operator=(TBase factor)
   {
      size_t szSizeN = size();
#pragma omp parallel for
      for (size_t idx = 0; idx < szSizeN; idx++)
      {
         m_xMemory[idx] = factor;
      }

      return *this;
   }

   template <typename Expression>
   Vector(const BaseExpression<TBase, Expression> &b) :m_xMemory(b.size())
   {
      size_t szSizeN = size();
#pragma omp parallel for
      for (size_t idx = 0; idx < szSizeN; idx++)
      {
         m_xMemory[idx] = b[idx];
      }

   }

   void swap(Vector &orig)
   {
      using std::swap;
      swap(m_xMemory, orig.m_xMemory);
   }

   TBase operator[] (size_t idx) const { return m_xMemory[idx]; }

   TBase & operator[] (size_t idx) { return m_xMemory[idx]; }

   size_t size() const { return m_xMemory.size(); }

   void print()
   {
      size_t szSizeN = size();
      for (size_t idx = 0; idx < szSizeN; idx++)
      {
         std::cout << "Index=" << idx << "\t" << "Value=" << m_xMemory[idx] << std::endl;

      }
   }

private:
   void copy(const Vector &orig) 
   {
      m_xMemory = orig.m_xMemory;
   }

   std::vector<TBase> m_xMemory;
};


template <typename TBase, typename E1, typename E2>
Binary_Expression<TBase, std::plus<TBase>, E1, E2> operator+(const BaseExpression<TBase, E1> & xE1, const BaseExpression< TBase, E2> & xE2)
{
   return Binary_Expression<TBase, std::plus<TBase>, E1, E2>(xE1.self(), xE2.self());
}


int main()
{
   Vector<double> x1(10);
   Vector<double> x2(10);
   Vector<double> x3(10);

   x1 = 7.5;
   x2 = 8.;
   x3 = 4.2;

   auto && intermediate =  x1 + x2;


//compil error   
/*
   for (int i = 0; i< 10; i++)
   {
       intermediate = intermediate + x3;   
   }
   */
   // inspection of the graph here
   Vector<double> result = intermediate + x2;


   result.print();   

}

事实上，在我的最终设计中，我想写以下内容:

   Vector<double> x1(10);
   Vector<double> x2(10);
   Vector<double> x3(10);

   x1 = 7.5;
   x2 = 8.;
   x3 = 4.2;

   Vector<double> intermediate = x1 + x2;
   for (int i = 0; i < 5; ++i)
       intermediate = intermediate + x3;

   Vector<double> result = x1 + x3 + intermediate;
   // finally into result I have the expression tree, and evaluate method which will make the graph inspection
   result.evaluate();

提前致谢乔纳森

最佳答案

恐怕这行不通，因为链接技术依赖于 intermediate 的类型捕获整个表达式的变量。所以它看起来像 Sum<Mult<Vector,Vector>> (此处简化)。但是类型不能在 for 循环的每次迭代中改变。

我看到替代方案:

不要将表达式捕获为类型，而是捕获为运行时结构，类型比方说 VectorExpression。这会对性能产生影响，因为您必须在运行时分析表达式图并限制您可以进行的优化种类。

第二种选择是使用模板元编程编写您自己的 for 循环(每一步都有一个新类型)。

折叠函数的例子(这是你想要的)。我们必须使用折叠仿函数，因为不支持函数的部分特化:

#include <utility>

template <int N, class V, class F>
struct foldf {
    auto operator()(V v, F&& f) -> decltype(auto) {
        auto next = f(v);
        return foldf<N - 1, decltype(next), F>()(next, std::move(f));
    }
};

template <class V, class F>
struct foldf<0, V, F> {
    auto operator()(V v, F&& f) -> decltype(auto) {
        return v;
    }
};

// just a helper to make usage simpler
template <int N>
class Repeat{};

template <int N, class V, class F>
auto fold(Repeat<N> tag, V v, F&& f) -> decltype(auto) {
    return foldf<N, V, F>()(v, std::move(f));
}

为了证明它做了我们想要的，让我们添加这段代码:

template <class T>
class Test {
};

class Other{};

template <class T>
auto wrap(T t) -> decltype(auto) {
    return Test<T>();
}

int main() {
    auto v = fold(Repeat<3>(), 0, [](auto t){ 
        return wrap(t); 
    });
    Other x = v;
}

结果应该是tmp.cpp:42:11: error: no viable conversion from 'Test<Test<Test<int> > >' to 'Other' , 这表明类型已保留。

关于c++ - 表达式模板 : unroll loop，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46381692/

26

4

0

文章推荐： javascript - CSS flex-grow 值不响应单击事件

文章推荐： c++ - 遇到中断语句时重新初始化指针变量

c - #pragma unroll(0) 和#pragma unroll(1) 有区别吗？
我阅读了有关循环展开的文档。它解释说，如果将展开因子设置为 1，则程序将像使用 #pragma nounrolling 一样工作。但是，该文件不包括#pragma unroll(0) 案例..由于
python - "Unrolling"递归函数？
我正在用 C++ 编写路径跟踪器，我想尝试将资源最密集的代码实现到 CUDA 或 OpenCL 中(我不确定该选择哪个)。我听说我的显卡的 CUDA 版本不支持递归，这是我的路径追踪器大量使用的东西
java - 数组乘以标量 : loop unrolling
在我的一个JAVA项目中，我通常必须将巨大的数组与标量相乘。因此我想通过使用所谓的循环展开来编写一种方法。到目前为止我已经想出了这个: public static float[] arrayTimes
grails - Spock @Unroll 注释
在最近的代码审查中，出现了关于 @Unroll 注释属于类级别还是方法级别的问题。该类的大多数方法(但不是全部)都需要 @Unroll。如果在类级别声明并且并非类的所有方法都需要它，声明 @Unrol
c++ - 表达式模板 : unroll loop
我有同样的问题: Expression templates: improving performance in evaluating expressions? 我的目标是展开这个表达式的循环 auto
c++ - #pragma unroll 语句是否可以用于清理编译器预评估代码？
在NVIDIA的制作精良reduction optimization documentation ，他们最终得到一个看起来像这样的 warpReduce: Template __device__ v
cuda - #pragma unroll 到底有什么作用？对线程数有影响吗？
我是 CUDA 新手，我无法理解循环展开。我编写了一段代码来理解该技术 __global__ void kernel(float *b, int size) { int tid = block
web - 有没有办法使用 "unroll"URL 缩短器？
URL 缩短器在 Twitter 等空间受限的媒体中很有用。但是这样的危险已经得到充分讨论(有限的生命周期、隐藏恶意链接、可用性等)。但是有没有一种很好的方法来预先解析来自 goo.gl 或 bit.
我可以让#Pragma unroll 接受宏/表达式而不是普通数字吗？
我试图告诉我的编译器使用#pragma unroll 为我展开一个循环。但是，迭代次数由编译时变量决定，因此循环需要展开那么多次。像这样: #define ITEMS 4 #pragma unroll
C++11 typelist unroller 和静态函数的代理调用者
在 C++11 中是否有一种简单的方法可以做到这一点？如果可能的话，我想同时保留多重继承和循环访问包中所有静态函数的能力。 #include struct A { static void foo()
python 3 : Unroll arguments from tuple
给定一个 Python 元组 t = v1, v2, v3 是否有一个实用程序可以解压这些元组以便给定: def foo(v1,v2,v3): pass 取而代之的是: foo(t[0],t[1],t
c - gcc 是否自动 "unroll"if 语句？
假设我有一个看起来像这样的循环: for(int i = 0; i : xor ebx,ebx 0x08048406 : push ecx 0x08048407 : xor
c - 在哪种情况下 "unroll-loops"不会使结果代码更快？
摘自 GCC 手册: -funroll-loops Unroll loops whose number of iterations can be determined at co
c# - 如何将 "unroll"构造成 "recursive"
不确定如何调用它，但假设您有一个看起来像这样的类: class Person { public string Name; public IEnumerable Friends; } 然
macros - 执行循环 "unrolling"的 Lisp 宏
我使用 Lisp 宏的第一步...... (defconstant width 7) (defconstant height 6) ... ; board is a 2D array of width
javascript - 如何让 JavaScript 向 'unroll' 传递可变数量的参数？
例如假设我有以下两个函数 function a(param1) { console.log(param1); } function b(func, params) { func(par
c++ - 在 "loop unrolling"中，所有展开的表达式都执行了吗？
我一直认为 foo2 下面的函数比 foo3 快，直到经过测试。所有代码如下: #include #include #include #include struct session {
C++ 2011 : range-based loop unrolling?
我想知道 C++ 编译器是否会像他们目前对“正常”循环所做的那样展开基于范围的循环以最大化性能，或者在某些情况下基于范围的循环会比正常循环慢？非常感谢。最佳答案基于范围的循环相当于: { a
optimization - 在什么类型的循环中最好在 CUDA 中使用 #pragma unroll 指令？
在 CUDA 中，可以使用 #pragma 展开循环。 unroll 指令通过增加指令级并行度来提高性能。 #pragma可以选择后跟一个数字，指定循环必须展开多少次。不幸的是，文档没有给出何时应该
c++ - -loop-unroll pass 是否强制 LLVM 展开循环？
我有一段代码，我希望 LLVM 按特定因子展开其中的所有循环。我正在使用以下命令: opt -mem2reg -simplifycfg -loops -lcssa -loop-simplify -lo

首页

博学

6Ren·AI

商城

c++ - 表达式模板 : unroll loop