gpt4 book ai didi

c++ - 如何在 C++ 中快速计算 vector 的归一化 l1 和 l2 范数?

转载 作者:塔克拉玛干 更新时间:2023-11-03 06:15:40 29 4
gpt4 key购买 nike

我有一个矩阵 X,它在 d 维空间中有 n 列数据 vector 。给定一个 vector xjv[j]是它的l1范数(所有abs(xji)的总和), w[j] 是它的 l2 范数的平方(所有 xji^2 的总和),而 pj[ i] 是条目的组合除以 l1l2 范数。最后,我需要输出:pj, v, w 用于后续应用。

// X = new double [d*n]; is the input.
double alpha = 0.5;
double *pj = new double[d];
double *x_abs = new double[d];
double *x_2 = new double[d];
double *v = new double[n]();
double *w = new double[n]();
for (unsigned long j=0; j<n; ++j) {
jm = j*m;
jd = j*d;
for (unsigned long i=0; i<d; ++i) {
x_abs[i] = abs(X[i+jd]);
v[j] += x_abs[i];
x_2[i] = x_abs[i]*x_abs[i];
w[j] += x_2[i];
}
for (unsigned long i=0; i<d; ++i){
pj[i] = alpha*x_abs[i]/v[j]+(1-alpha)*x_2[i]/w[j];
}

// functionA(pj){ ... ...} for subsequent applications
}
// functionB(v, w){ ... ...} for subsequent applications

我的上述算法需要 O(nd) float /时间复杂度,任何人都可以帮助我通过使用 building-functoin 或 C++ 中的新实现来加速它吗?减少O(nd)中的常数值对我也很有帮助。

最佳答案

让我猜猜:因为你有与性能相关的问题,你的 vector 的维度非常大。
如果是这种情况,那么值得考虑“CPU 缓存局部性”——一些有趣的信息in a cppcon14 presentation .
如果数据在 CPU 缓存中不可用,那么 abs -ing 或平方它一旦可用,与 CPU 等待数据的时间相比就相形见绌了。

考虑到这一点,您可能希望尝试以下解决方案(不保证会提高性能 - 编译器可能在优化代码时实际应用这些技术)

for (unsigned long j=0; j<n; ++j) {
// use pointer arithmetic - at > -O0 the compiler will do it anyway
double *start=X+j*d, *end=X+(j+1)*d;

// this part avoid as much as possible the competition
// on CPU caches between X and v/w.
// Don't store the norms in v/w as yet, keep them in registers
double l1norm=0, l2norm=0;
for(double *src=start; src!=end; src++) {
double val=*src;
l1norm+=abs(src);
l2norm+= src*src;
}
double pl1=alpha/l1norm, pl2=(1-alpha)*l2norm;
for(double *src=start, *dst=pj; src!=end; src++, dst++) {
// Yes, recomputing abs/sqr may actually save time by not
// creating competition on CPU caches with x_abs and x_2
double val=*src;
*dst = pl1*abs(val) + pl2*val*val;
}
// functionA(pj){ ... ...} for subsequent applications

// Think well if you really need v/w. If you really do,
// at least there are two values to be sent for storage into memory,
//meanwhile the CPU can actually load the next vector into cache
v[j]=l1norm; w[j]=l2norm;
}
// functionB(v, w){ ... ...} for subsequent applications

关于c++ - 如何在 C++ 中快速计算 vector 的归一化 l1 和 l2 范数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41392730/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com