c++ - ASCII数据导入: how can I match Fortran's bulk read performance in C++?-6ren

c++ - ASCII数据导入: how can I match Fortran's bulk read performance in C++?

转载作者：IT老高更新时间：2023-10-28 22:15:09

设置

您好，我有用于读取ASCII double 数据的Fortran代码(问题底部的数据文件示例):

program ReadData
    integer :: mx,my,mz
    doubleprecision, allocatable, dimension(:,:,:) :: charge

    ! Open the file 'CHGCAR'
    open(11,file='CHGCAR',status='old')

    ! Get the extent of the 3D system and allocate the 3D array
    read(11,*)mx,my,mz
    allocate(charge(mx,my,mz) )

    ! Bulk read the entire block of ASCII data for the system
    read(11,*) charge
end program ReadData

和“等效” C++代码:

#include <fstream>
#include <vector>

using std::ifstream;
using std::vector;
using std::ios;

int main(){
    int mx, my, mz;

    // Open the file 'CHGCAR'
    ifstream InFile('CHGCAR', ios::in);

    // Get the extent of the 3D system and allocate the 3D array
    InFile >> mx >> my >> mz;
    vector<vector<vector<double> > > charge(mx, vector<vector<double> >(my, vector<double>(mz)));

    // Method 1: std::ifstream extraction operator to double
    for (int i = 0; i < mx; ++i)
        for (int j = 0; j < my; ++j)
            for (int k = 0; k < mz; ++k)
                InFile >> charge[i][j][k];

    return 0;
}

Fortran踢@ $$并取名

注意行

read(11,*) charge

执行与C++代码相同的任务:

for (int i = 0; i < mx; ++i)
    for (int j = 0; j < my; ++j)
        for (int k = 0; k < mz; ++k)
            InFile >> charge[i][j][k];

其中 InFile是 if stream对象(请注意，尽管Fortran代码中的迭代器从1而不是0开始，但范围是相同的)。

但是，Fortran代码的运行方式要比C++代码快得多，我认为是因为Fortran可以一次执行所有类似的操作，例如根据范围和形状( mx， my， mz的值)读取/解析文件，然后只需将 charge指向读取数据的内存即可。相比之下，C++代码需要在每次迭代时来回访问 InFile，然后来回访问 charge(通常较大)，从而导致(我相信)更多的IO和内存操作。

我正在读取数十亿个值(几个GB)，因此我真的想最大化性能。

我的问题:

如何在C++中实现Fortran代码的性能？

继续...

这是一个比上述C++快得多的C++实现，其中将文件读入 char数组中，然后在解析 charge数组时填充 char:

#include <fstream>
#include <vector>
#include <cstdlib>

using std::ifstream;
using std::vector;
using std::ios;

int main(){
    int mx, my, mz;

    // Open the file 'CHGCAR'
    ifstream InFile('CHGCAR', ios::in);

    // Get the extent of the 3D system and allocate the 3D array
    InFile >> mx >> my >> mz;
    vector<vector<vector<double> > > charge(mx, vector<vector<double> >(my, vector<double>(mz)));

    // Method 2: big char array with strtok() and atof()

    //  Get file size
    InFile.seekg(0, InFile.end);
    int FileSize = InFile.tellg();
    InFile.seekg(0, InFile.beg);

    //  Read in entire file to FileData
    vector<char> FileData(FileSize);
    InFile.read(FileData.data(), FileSize);
    InFile.close();

    /*
     *  Now simply parse through the char array, saving each
     *  value to its place in the array of charge density
     */
    char* TmpCStr = strtok(FileData.data(), " \n");

    // Gets TmpCStr to the first data value
    for (int i = 0; i < 3 && TmpCStr != NULL; ++i)
        TmpCStr = strtok(NULL, " \n");

    for (int i = 0; i < Mz; ++i)
        for (int j = 0; j < My; ++j)
            for (int k = 0; k < Mx && TmpCStr != NULL; ++k){
                Charge[i][j][k] = atof(TmpCStr);
                TmpCStr = strtok(NULL, " \n");
            }

    return 0;
}

同样，这比简单的基于 >>运算符的方法要快得多，但比Fortran版本要慢得多，更不用说更多的代码了。

如何获得更好的性能？

我确定方法2是我自己实现的方法，但是我很好奇如何提高性能以匹配Fortran代码。我正在考虑和目前正在研究的事物类型是:

C++ 11和C++ 14功能

为执行此类操作而优化的C或C++库

对方法2中使用的各个方法的改进

标记化库，例如C++ String Toolkit Library中的标记库，而不是strtok()

比char

更有效的 double到 atof()转换

C++字符串工具包

特别是，C++字符串工具包库将使用 FileData和定界符 " \n"并给我一个字符串 token 对象(将其称为 FileTokens，那么三重 for循环看起来像

for (int k = 0; k < Mz; ++k)
    for (int j = 0; j < My; ++j)
        for (int i = 0; i < Mx; ++i)
            Charge[k][j][i] = FileTokens.nextFloatToken();

这样可以稍微简化代码，但是在本质上将 FileData的内容复制到 FileTokens中还需要进行额外的工作，这可能会扼杀使用 nextFloatToken()方法带来的任何性能提升(大概比 strtok()/ atof()组合更有效)。

在 C++ String Toolkit (StrTk) Tokenizer tutorial page上有一个示例(包含在问题的底部)，该示例使用StrTk的 for_each_line()处理器，看起来与所需的应用程序相似。但是，这两种情况之间的区别在于，我无法确定在输入文件的每一行上会出现多少数据，而且我对StrTk的了解还不够多，无法说明这是否可行。

不是重复的

快速将ASCII数据读取到数组或结构的主题曾经出现过，但是我已经查看了以下帖子，它们的解决方案还不够:

Fastest way to read data from a lot of ASCII files

How to read numbers from an ASCII file (C++)

Read Numeric Data from a Text File in C++

Reading a file and storing the contents in an array

C/C++ Fast reading large ASCII data file to array or struct

Read ASCII file into matrix in C++

How can I read ASCII data file in C++

Reading a file and storing the contents in an array

Reading in data in columns from a file (C++)

The Fastest way to read a .txt File

How does fast input/ output work in C/C++, by using registers, hexadecimal number and the likes?

reading file into struct array

示例数据

这是我要导入的数据文件的示例。 ASCII数据由空格和换行符分隔，如下例所示:

 5 3 3
 0.23080516813E+04 0.22712439791E+04 0.21616898980E+04 0.19829996749E+04 0.17438686650E+04
 0.14601734127E+04 0.11551623512E+04 0.85678544224E+03 0.59238325489E+03 0.38232265554E+03
 0.23514479113E+03 0.14651943589E+03 0.10252743482E+03 0.85927499703E+02 0.86525872161E+02
 0.10141182750E+03 0.13113419142E+03 0.18057147781E+03 0.25973252462E+03 0.38303754418E+03
 0.57142097675E+03 0.85963728360E+03 0.12548019843E+04 0.17106124085E+04 0.21415379433E+04
 0.24687336309E+04 0.26588012477E+04 0.27189091499E+04 0.26588012477E+04 0.24687336309E+04
 0.21415379433E+04 0.17106124085E+04 0.12548019843E+04 0.85963728360E+03 0.57142097675E+03
 0.38303754418E+03 0.25973252462E+03 0.18057147781E+03 0.13113419142E+03 0.10141182750E+03
 0.86525872161E+02 0.85927499703E+02 0.10252743482E+03 0.14651943589E+03 0.23514479113E+03

StrTk示例

这是上面提到的 StrTk example。该方案正在解析包含3D网格信息的数据文件:

输入数据:

5
+1.0,+1.0,+1.0
-1.0,+1.0,-1.0
-1.0,-1.0,+1.0
+1.0,-1.0,-1.0
+0.0,+0.0,+0.0
4
0,1,4
1,2,4
2,3,4
3,1,4

码:

struct point
{
   double x,y,z;
};

struct triangle
{
   std::size_t i0,i1,i2;
};

int main()
{
   std::string mesh_file = "mesh.txt";
   std::ifstream stream(mesh_file.c_str());
   std::string s;
   // Process points section
   std::deque<point> points;
   point p;
   std::size_t point_count = 0;
   strtk::parse_line(stream," ",point_count);
   strtk::for_each_line_n(stream,
                          point_count,
                          [&points,&p](const std::string& line)
                          {
                             if (strtk::parse(line,",",p.x,p.y,p.z))
                                points.push_back(p);
                          });

   // Process triangles section
   std::deque<triangle> triangles;
   triangle t;
   std::size_t triangle_count = 0;
   strtk::parse_line(stream," ",triangle_count);
   strtk::for_each_line_n(stream,
                          triangle_count,
                          [&triangles,&t](const std::string& line)
                          {
                             if (strtk::parse(line,",",t.i0,t.i1,t.i2))
                                triangles.push_back(t);
                          });
   return 0;
}

最佳答案

这个...

vector<vector<vector<double> > > charge(mx, vector<vector<double> >(my, vector<double>(mz)));

...创建一个具有所有0.0值的临时 vector<double>(mz)，然后将其复制 my次(或移动，然后使用C++ 11编译器复制 my-1次，但差别不大...)以创建一个临时 vector<vector<double>>(my, ...)，然后将其复制 mx次(...如上所述)来初始化所有数据。无论如何，您都是通过这些元素读取数据的-无需花费时间在这里初始化数据。而是创建一个空的 charge，并使用嵌套循环为元素添加足够的内存 reserve()，而不填充它们。

接下来，检查您是否在启用优化的情况下进行编译。如果您现在仍然比FORTRAN慢，请在填充数据的嵌套循环中尝试创建对您要使用的 .emplace_back元素的 vector 的引用:

for (int i = 0; i < mx; ++i)
    for (int j = 0; j < my; ++j)
    {
        std::vector<double>& v = charge[i][j];
        for (int k = 0; k < mz; ++k)
        {
            double d;
            InFile >> d;
            v.emplace_pack(d);
        }
    }

如果您的优化器做得不错，那无济于事，但值得一试。

如果您仍然比较慢-或者只是想尝试更快-您可以尝试优化数字解析:您说数据是所有格式的ala 0.23080516813E+04-具有固定大小，这样您就可以轻松计算要读入一个字节的字节数缓冲区以从内存中为您提供相当数量的值，那么对于每个值，您都可以在 atol之后启动 .以提取23080516813，然后将其乘以10的负幂(11(您的数字位数)负04): ，保留一张十次幂的表格，并使用提取的指数(即4)对其进行索引。 (请注意，在许多常见硬件上，乘以1E-7比乘以1E7要快。)

而且，如果您想突击，请切换到使用内存映射文件访问。值得考虑使用 boost::mapped_file_source，因为它甚至比POSIX API(更不用说Windows)和可移植性都更易于使用，但是直接针对OS API进行编程也不是一件容易的事。

更新-对第一条和第二条评论的回应

使用增强内存映射的示例:

#include <boost/iostreams/device/mapped_file.hpp>

boost::mapped_file_params params("dbldat.in");
boost::mapped_file_source file(params);
file.open();
ASSERT(file.is_open());
const char* p = file.data();
const char* nl = strchr(p, '\n');
std::istringstream iss(std::string(p, nl - p));
size_t x, y, z;
ASSERT(iss >> x >> y >> z);

上面的代码将文件映射到地址为 p的内存中，然后从第一行开始解析尺寸。从 double开始，继续解析实际的 ++nl表示形式。我在上面提到了一种方法，并且您担心数据格式的更改:您可以向文件中添加版本号，因此可以使用优化的解析，直到版本号更改，然后再使用通用的“未知”名称文件格式。就泛型而言，对于使用 int chars_to_skip; double my_double; ASSERT(sscanf(ptr, "%f%n", &my_double, &chars_to_skip) == 1);的内存表示是合理的:请参阅 sscanf docs here-然后可以通过 chars_to_skip将指针前进到数据中。

Next, are you suggesting to combine the reserve() solution with the reference creation solution?

是。

And (pardon my ignorance) why would using a reference to charge[i][j] and v.emplace_back() be better than charge[i][j].emplace_back()?

该建议是为了确保编译器不会针对放置的每个元素重复评估 charge[i][j]:希望它不会对性能造成任何影响，您可以返回 charge[i][j].emplace()，但是恕我直言，值得快速检查一下。

Lastly, I'm skeptical about using an empty vector and reserve()ing at the tops of each loop. I have another program that came to a grinding halt using that method, and replacing the reserve()s with a preallocated multidimensional vector sped it up a lot.

这是可能的，但不一定在总体上适用或在此适用-很大程度上取决于编译器/优化器(尤其是循环展开)等。对于未优化的 emplace_back，您必须反复针对 size()检查 vector capacity()，但是如果优化器执行了a做得好，应该微不足道。与许多性能调优一样，您通常无法完美地推理出事情并得出最快的结论，而必须尝试其他选择，并使用实际的编译器，程序数据等进行测量。

关于c++ - ASCII数据导入: how can I match Fortran's bulk read performance in C++?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28082794/

文章推荐： c++ - 定义未定义的行为

文章推荐： c++ - 最高效的高性能服务器套接字/线程设计

文章推荐： c++ - 下面的代码是否应该按照 C++ 标准编译？

performance - "performant"软件究竟是什么意思？
关闭。这个问题是opinion-based .它目前不接受答案。想改善这个问题吗？更新问题，以便可以通过 editing this post 用事实和引文回答问题. 8年前关闭。 Improve t
performance - 灿灿授权: Performance Issue
暂时忘记能力的定义，只关注能力的“检查”(使用“授权!”)，我看到 CanCan 添加了大约 400 毫秒，用于简单地检查用户是否具有特定的能力主题/模型。这是预期的吗(我假设不是)？或者，有没有可
performance - Swift 显式与推断类型 : Performance
我正在阅读有关 Swift 的教程 ( http://www.raywenderlich.com/74438/swift-tutorial-a-quick-start )，它预定义为不显式设置类型，因
performance - 编码优先级 : Performance, 可维护性、可重用性？
这主要是由于对 SQL 问题的回答。由于性能原因，有意省略了 UDF 和子查询。我没有包括可靠性并不是说它应该被视为理所当然，但代码必须工作。性能永远是第一位的吗？提供了许多以性能为主要优先事项的答
performance - Scala递归与循环: performance and runtime considerations
我已经编写了一个简单的测试平台来测量三种阶乘实现的性能:基于循环的，非尾递归的和尾递归的。 Surprisingly to me the worst performant was the loop o
performance - ui-performance 插件无法在开发模式下工作 (Grails)
我已将 ui-performance 插件应用到我的应用程序中。不幸的是，在开发模式下运行应用程序时它似乎不起作用。例如，我的 javascript 导入是用“vnull”版本呈现的。例如不会
performance - 编译 F# 引用 : performance?
我有一个我操作的 F# 引用(我在各处添加对象池以回收经常创建和删除的短期对象)。我想运行结果报价；现在我使用了 F# PowerPack，它提供了将引用转换为表达式树和委托(delegate)的方法
performance - Spark独立: SparklyR : Performance issues
我正在尝试在 Spark 服务器上运行 SparklyR 库中的机器学习算法。 1 个簇 8 核 24G内存 Ubuntu 16.04 星火2.2 独立配置 1名师傅/2名 worker 每个执行器的
performance - 架构和索引以及主键 : Differences in lookup performance?
我有一个数据库(准确地说是在 postgres 上运行)，具有以下结构: user1 (schema) | - cars (table) - airplanes (table, again) .
performance - iOS/核心动画 : Performance tuning
我的应用程序在我的 iPad 上运行。但它的表现非常糟糕——我的速度低于 15fps。谁能帮我优化一下？它基本上是一个轮子(派生自 UIView)，包含 12 个按钮(派生自 UIControl)。
performance - coursera progfun1 : scala union performance
在完成“Scala 中的函数式编程原则”@coursera 类(class)第 3 周的作业时，我发现当我实现视频类(class)中所示的函数联合时: override def union(tha
performance - Symfony2 依赖注入(inject) : performances impact
我正在重构我的一个 Controller 以使其成为一项服务，我想知道不将整个服务容器注入(inject)我的 Controller 是否会对性能产生影响。这样效率更高吗: innova.path.
performance - facelet tag performance
我有一个要显示的内容很大的文件。例如在显示用户配置文件时，中的每个 EL 表达式需要一个 userId 作为 bean 的参数，该参数取自 session 上下文。我在 xhtml 文件中将这个 u
performance - OpenGL/DirectX : How does Mipmapping improve performance?
我非常了解 mipmapping。我不明白(在硬件/驱动程序级别)是 mipmapping 如何提高应用程序的性能(至少这是经常声称的)。在执行片段着色器之前，驱动程序不知道要访问哪个 mipmap
performance - Scala 惰性值 : performance penalty? 线程安全？
这个问题在这里已经有了答案: 10年前关闭。 Possible Duplicate: What's the (hidden) cost of lazy val? (Scala) Scala 允许定义惰
java - build().perform() 和 Perform() 之间有什么区别
一些文章建议现在 build() 包含在 perform() 本身中，而其他人则建议当要链接多个操作时使用 build().perform()一起。最佳答案 build() 包含在 perform(
performance - postgres 函数 : when does IMMUTABLE hurt performance?
Postgres docs说 For best optimization results, you should label your functions with the strictest vol
performance - 零成本抽象 : performance of for-loop vs. 迭代器
阅读Zero-cost abstractions看着 Introduction to rust: a low-level language with high-level abstractions我尝
performance - MQ : CPU Performance 上的 SSL
我想在 MQ 服务器上部署 SSL，但我想知道我当前的 CPU 容量是否支持 SSL。 (我没有预算增加 CPU 内核和 MQ PVU 的数量) 我的规范: Windows 2003 服务器 SP2，
performance - Chrome Performance Profiler 中的“Timings”选项卡丢失
因此，我在 Chrome 开发者工具的性能选项卡内的时间部分成功地监控了我的 React Native 应用程序的性能。突然在应用程序的特定重新加载时，Timings 标签丢失。我已尝试重置

IT老高

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c++ - ASCII数据导入: how can I match Fortran's bulk read performance in C++?