gpt4 book ai didi

c++ - 矩阵乘法的 Eigen 代码运行速度比使用 std::vector 的循环乘法慢

转载 作者:行者123 更新时间:2023-12-01 14:24:55 31 4
gpt4 key购买 nike

我正在学习 C++ 以及机器学习,所以我决定使用 Eigen 库进行矩阵乘法。我正在训练一个感知器来识别 MNIST 数据库中的数字。对于训练阶段,我将训练周期(或纪元)的数量设置为 T = 100。

“训练矩阵”是一个 10000 x 785 矩阵。每行的第零个元素包含“标签”,用于标识输入数据(该行的其余 784 个元素)映射到的数字。

还有一个 784 x 1“权重” vector ,其中包含 784 个特征中每个特征的权重。权重 vector 将与每个输入 vector (训练矩阵的一行,不包括第零个元素)相乘,并在每次迭代时更新,这对于 10000 个输入中的每一个都会发生 T 次。

我编写了以下程序(它捕获了我正在做的事情的本质),我在其中将矩阵的行与权重 vector 相乘(使用 std::vector 和循环)的“普通”方法与什么进行了比较我觉得这是我能用 Eigen 方法做的最好的事情。它并不是真正的矩阵与 vector 的乘法,我实际上是对训练矩阵的行进行切片并将其与权重 vector 相乘。

std::vector 方法的训练周期持续时间为 160.662 毫秒,而 Eigen 方法通常超过 10,000 毫秒。

我使用以下命令编译程序:

clang++ -Wall -Wextra -pedantic -O3 -march=native -Xpreprocessor -fopenmp permute.cc -o perm -std=c++17

我使用的是运行 macOS Catalina 并配备 2.5 GHz 双核 i5 的“中期”2012 MacBook Pro。

#include <iostream>
#include <algorithm>
#include <random>
#include <Eigen/Dense>
#include <ctime>
#include <chrono>
using namespace Eigen;

int main() {
Matrix<uint8_t, Dynamic, Dynamic> m = Matrix<uint8_t, Dynamic, Dynamic>::Random(10000, 785);
Matrix<double, 784, 1> weights_m = Matrix<double, 784, 1>::Random(784, 1);
Matrix<uint8_t, 10000, 1> y_m, t_m;

std::minstd_rand rng;
rng.seed(time(NULL));
std::uniform_int_distribution<> dist(0,1); //random integers between 0 and 1
for (int i = 0; i < y_m.rows(); i++) {
y_m(i) = dist(rng);
t_m(i) = dist(rng);
}

int T = 100;
int err;
double eta;
eta = 0.25; //learning rate
Matrix<double, 1, 1> sum_wx_m;

auto start1 = std::chrono::steady_clock::now(); //start of Eigen Matrix loop

for (int iter = 0; iter < T; iter++) {
for (int i = 0; i < m.rows(); i++) {
sum_wx_m = m.block(i, 1, 1, 784).cast<double>() * weights_m;

//some code to update y_m(i) based on the value of sum_wx_m which I left out

err = y_m(i) - t_m(i);
if (fabs(err) > 0) { //update the weights_m matrix if there's a difference between target and predicted
weights_m = weights_m - eta * err * m.block(i, 1, 1, 784).transpose().cast<double>();
}
}
}

auto end1 = std::chrono::steady_clock::now();
auto diff1 = end1 - start1;
std::cout << "Eigen matrix time is "<<std::chrono::duration <double, std::milli> (diff1).count() << " ms" << std::endl;

//checking how std::vector form performs;

std::vector<std::vector<uint8_t>> v(10000);
std::vector<double> weights_v(784);
std::vector<uint8_t> y_v(10000), t_v(10000);

for (unsigned long i = 0; i < v.size(); i++) {
for (int j = 0; j < m.cols(); j++) {
v[i].push_back(m(i, j));
}
}

for (unsigned long i = 0; i < weights_v.size(); i++) {
weights_v[i] = weights_m(i);
}

for (unsigned long i = 0; i < y_v.size(); i++) {
y_v[i] = dist(rng);
t_v[i] = dist(rng);
}

double sum_wx_v;

auto start2 = std::chrono::steady_clock::now(); //start of vector loop

for (int iter = 0; iter < T; iter++) {
for(unsigned long j = 0; j < v.size(); j++) {
sum_wx_v = 0.0;
for (unsigned long k = 1; k < v[0].size() ; k++) {
sum_wx_v += weights_v[k - 1] * v[j][k];
}

//some code to update y_v[i] based on the value of sum_wx_v which I left out

err = y_v[j] - t_v[j];
if (fabs(err) > 0) {//update the weights_v matrix if there's a difference between target and predicted
for (unsigned long k = 1; k < v[0].size(); k++) {
weights_v[k - 1] -= eta * err * v[j][k];
}
}
}
}

auto end2 = std::chrono::steady_clock::now();
auto diff2 = end2 - start2;
std::cout << "std::vector time is "<<std::chrono::duration <double, std::milli> (diff2).count() << " ms" << std::endl;
}

我应该进行哪些更改才能获得更好的运行时间?

最佳答案

可能不是最好的解决方案,但您可以尝试:

  • 由于 Eigen 的默认数据顺序是列优先,您可以让训练矩阵为 785x10000,这样每个训练标签/数据对在内存中都是连续的(同时更改计算 sum_wx_m 的行)。
  • 使用 block 操作的固定大小版本,即,您可以将 m.block(i, 1, 1, 784) 替换为 m.block<1,784>(i, 1 )(如果您已经切换了训练矩阵布局,则顺序相反,或者您可以简单地映射训练矩阵的数据部分并使用 .col() 引用 [参见下面的示例])

这是您根据这些想法修改的代码:

#include <iostream>
#include <algorithm>
#include <random>
#include <Eigen/Dense>
#include <ctime>
#include <chrono>
using namespace Eigen;

int main() {
Matrix<uint8_t, Dynamic, Dynamic> m = Matrix<uint8_t, Dynamic, Dynamic>::Random(785, 10000);
Map<Matrix<uint8_t, Dynamic, Dynamic>> m_data(m.data() + 785, 784, 10000);

Matrix<double, 784, 1> weights_m = Matrix<double, 784, 1>::Random(784, 1);
Matrix<uint8_t, 10000, 1> y_m, t_m;

std::minstd_rand rng;
rng.seed(time(NULL));
std::uniform_int_distribution<> dist(0,1); //random integers between 0 and 1
for (int i = 0; i < y_m.rows(); i++) {
y_m(i) = dist(rng);
t_m(i) = dist(rng);
}

int T = 100;
int err;
double eta;
eta = 0.25; //learning rate
Matrix<double, 1, 1> sum_wx_m;

auto start1 = std::chrono::steady_clock::now(); //start of Eigen Matrix loop

for (int iter = 0; iter < T; iter++) {
for (int i = 0; i < m.cols(); i++) {
sum_wx_m = weights_m.transpose() * m_data.col(i).cast<double>();

//some code to update y_m(i) based on the value of sum_wx_m which I left out

err = y_m(i) - t_m(i);
if (fabs(err) > 0) { //update the weights_m matrix if there's a difference between target and predicted
weights_m = weights_m - eta * err * m_data.col(i).cast<double>();
}
}
}

auto end1 = std::chrono::steady_clock::now();
auto diff1 = end1 - start1;
std::cout << "Eigen matrix time is "<<std::chrono::duration <double, std::milli> (diff1).count() << " ms" << std::endl;

//checking how std::vector form performs;

std::vector<std::vector<uint8_t>> v(10000);
std::vector<double> weights_v(784);
std::vector<uint8_t> y_v(10000), t_v(10000);

for (unsigned long i = 0; i < v.size(); i++) {
for (int j = 0; j < m.rows(); j++) {
v[i].push_back(m(j, i));
}
}

for (unsigned long i = 0; i < weights_v.size(); i++) {
weights_v[i] = weights_m(i);
}

for (unsigned long i = 0; i < y_v.size(); i++) {
y_v[i] = dist(rng);
t_v[i] = dist(rng);
}

double sum_wx_v;

auto start2 = std::chrono::steady_clock::now(); //start of vector loop

for (int iter = 0; iter < T; iter++) {
for(unsigned long j = 0; j < v.size(); j++) {
sum_wx_v = 0.0;
for (unsigned long k = 1; k < v[0].size() ; k++) {
sum_wx_v += weights_v[k - 1] * v[j][k];
}

//some code to update y_v[i] based on the value of sum_wx_v which I left out

err = y_v[j] - t_v[j];
if (fabs(err) > 0) {//update the weights_v matrix if there's a difference between target and predicted
for (unsigned long k = 1; k < v[0].size(); k++) {
weights_v[k - 1] -= eta * err * v[j][k];
}
}
}
}

auto end2 = std::chrono::steady_clock::now();
auto diff2 = end2 - start2;
std::cout << "std::vector time is "<<std::chrono::duration <double, std::milli> (diff2).count() << " ms" << std::endl;
}

我用 i7-9700K 在我的 Ubuntu 桌面上编译了这段代码:

g++ -Wall -Wextra -O3 -std=c++17
====================================
Eigen matrix time is 110.523 ms
std::vector time is 117.826 ms


g++ -Wall -Wextra -O3 -march=native -std=c++17
=============================================
Eigen matrix time is 66.3044 ms
std::vector time is 71.2296 ms

关于c++ - 矩阵乘法的 Eigen 代码运行速度比使用 std::vector 的循环乘法慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63423668/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com