c - 循环拆分使代码变慢-6ren

c - 循环拆分使代码变慢

转载作者：太空狗更新时间：2023-10-29 15:37:50

所以我正在优化一个循环(作为作业)，该循环将 10,000 个元素相加 600,000 次。没有优化的时间是23.34s~，我的目标是B小于7秒，A小于5秒。

所以我首先像这样展开循环来开始我的优化。

int     j;

        for (j = 0; j < ARRAY_SIZE; j += 8) {
            sum += array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] +  array[j+6] + array[j+7];

这将运行时间减少到大约 6.4 秒(如果我进一步展开，我可以达到大约 6 秒)。

所以我想我会尝试添加子和并在最后求和以节省读写依赖性的时间，我想出了如下所示的代码。

int     j;

    for (j = 0; j < ARRAY_SIZE; j += 8) {
        sum0 += array[j] + array[j+1]; 
        sum1 += array[j+2] + array[j+3];
        sum2 += array[j+4] + array[j+5]; 
        sum3 += array[j+6] + array[j+7];

然而，这增加运行时间到大约 6.8 秒

我使用指针尝试了类似的技术，我能做的最好的是大约 15 秒。

我只知道我运行它的机器(因为它是学校购买的一项服务)是一个 32 位、远程、基于 Intel 的 Linux 虚拟服务器，我相信它正在运行 Red Hat。

我已经尝试了所有我能想到的加速代码的技术，但它们似乎都产生了相反的效果。有人可以详细说明我做错了什么吗？或者我可以用来降低运行时间的另一种技术？老师能做的最好的是大约 4.8 秒。

作为一个附加条件，我在完成的项目中不能有超过 50 行代码，所以做一些复杂的事情可能是不可能的。

这是两个来源的完整副本

    #include <stdio.h>
#include <stdlib.h>

// You are only allowed to make changes to this code as specified by the comments in it.

// The code you submit must have these two values.
#define N_TIMES     600000
#define ARRAY_SIZE   10000

int main(void)
{
    double  *array = calloc(ARRAY_SIZE, sizeof(double));
    double  sum = 0;
    int     i;

    // You can add variables between this comment ...

//  double sum0 = 0;
//  double sum1 = 0;
//  double sum2 = 0;
//  double sum3 = 0;

    // ... and this one.

    // Please change 'your name' to your actual name.
    printf("CS201 - Asgmt 4 - ACTUAL NAME\n");

    for (i = 0; i < N_TIMES; i++) {

        // You can change anything between this comment ...

        int     j;

        for (j = 0; j < ARRAY_SIZE; j += 8) {
            sum += array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] +  array[j+6] + array[j+7];
        }

        // ... and this one. But your inner loop must do the same
        // number of additions as this one does.

        }

    // You can add some final code between this comment ...
//  sum = sum0 + sum1 + sum2 + sum3;
    // ... and this one.

    return 0;
}

分解代码

    #include <stdio.h>
#include <stdlib.h>

// You are only allowed to make changes to this code as specified by the comments in it.

// The code you submit must have these two values.
#define N_TIMES     600000
#define ARRAY_SIZE   10000

int main(void)
{
    double  *array = calloc(ARRAY_SIZE, sizeof(double));
    double  sum = 0;
    int     i;

    // You can add variables between this comment ...

    double sum0 = 0;
    double sum1 = 0;
    double sum2 = 0;
    double sum3 = 0;

    // ... and this one.

    // Please change 'your name' to your actual name.
    printf("CS201 - Asgmt 4 - ACTUAL NAME\n");

    for (i = 0; i < N_TIMES; i++) {

        // You can change anything between this comment ...

        int     j;

        for (j = 0; j < ARRAY_SIZE; j += 8) {
            sum0 += array[j] + array[j+1]; 
            sum1 += array[j+2] + array[j+3];
            sum2 += array[j+4] + array[j+5]; 
            sum3 += array[j+6] + array[j+7];
        }

        // ... and this one. But your inner loop must do the same
        // number of additions as this one does.

        }

    // You can add some final code between this comment ...
    sum = sum0 + sum1 + sum2 + sum3;
    // ... and this one.

    return 0;
}

回答

我们用来判断成绩的“时间”应用有点不对劲。我能做的最好的是 4.9~ 展开循环 50 次并像我在下面使用 TomKarzes 的基本格式那样对其进行分组。

int     j;
        for (j = 0; j < ARRAY_SIZE; j += 50) {
            sum +=(((((((array[j] + array[j+1]) + (array[j+2] + array[j+3])) +
                    ((array[j+4] + array[j+5]) + (array[j+6] + array[j+7]))) + 
                    (((array[j+8] + array[j+9]) + (array[j+10] + array[j+11])) +
                    ((array[j+12] + array[j+13]) + (array[j+14] + array[j+15])))) +
                    ((((array[j+16] + array[j+17]) + (array[j+18] + array[j+19]))))) +
                    (((((array[j+20] + array[j+21]) + (array[j+22] + array[j+23])) +
                    ((array[j+24] + array[j+25]) + (array[j+26] + array[j+27]))) + 
                    (((array[j+28] + array[j+29]) + (array[j+30] + array[j+31])) +
                    ((array[j+32] + array[j+33]) + (array[j+34] + array[j+35])))) +
                    ((((array[j+36] + array[j+37]) + (array[j+38] + array[j+39])))))) + 
                    ((((array[j+40] + array[j+41]) + (array[j+42] + array[j+43])) +
                    ((array[j+44] + array[j+45]) + (array[j+46] + array[j+47]))) + 
                    (array[j+48] + array[j+49])));
        }

最佳答案

我对分组进行了一些试验。在我的机器上，使用 gcc，我发现以下方法效果最好:

    for (j = 0; j < ARRAY_SIZE; j += 16) {
        sum = sum +
              (array[j   ] + array[j+ 1]) +
              (array[j+ 2] + array[j+ 3]) +
              (array[j+ 4] + array[j+ 5]) +
              (array[j+ 6] + array[j+ 7]) +
              (array[j+ 8] + array[j+ 9]) +
              (array[j+10] + array[j+11]) +
              (array[j+12] + array[j+13]) +
              (array[j+14] + array[j+15]);
    }

换句话说，它展开 16 次，将总和分组成对，然后将这些对线性相加。我还删除了 += 运算符，这会影响何时首次在加法中使用 sum。

我发现测量的时间从一次运行到下一次运行有很大差异，即使没有任何改变，所以我建议在对时间是否有所改善或变差做出任何结论之前对每个版本进行多次计时。

我很想知道使用此版本的内部循环，您在计算机上得到的数字是多少。

更新:这是我目前最快的版本(在我的机器上，使用我的编译器):

    int     j1, j2;

    j1 = 0;
    do {
        j2 = j1 + 20;
        sum = sum +
              (array[j1   ] + array[j1+ 1]) +
              (array[j1+ 2] + array[j1+ 3]) +
              (array[j1+ 4] + array[j1+ 5]) +
              (array[j1+ 6] + array[j1+ 7]) +
              (array[j1+ 8] + array[j1+ 9]) +
              (array[j1+10] + array[j1+11]) +
              (array[j1+12] + array[j1+13]) +
              (array[j1+14] + array[j1+15]) +
              (array[j1+16] + array[j1+17]) +
              (array[j1+18] + array[j1+19]);
        j1 = j2 + 20;
        sum = sum +
              (array[j2   ] + array[j2+ 1]) +
              (array[j2+ 2] + array[j2+ 3]) +
              (array[j2+ 4] + array[j2+ 5]) +
              (array[j2+ 6] + array[j2+ 7]) +
              (array[j2+ 8] + array[j2+ 9]) +
              (array[j2+10] + array[j2+11]) +
              (array[j2+12] + array[j2+13]) +
              (array[j2+14] + array[j2+15]) +
              (array[j2+16] + array[j2+17]) +
              (array[j2+18] + array[j2+19]);
    }
    while (j1 < ARRAY_SIZE);

这使用了 40 的总展开量，分为两组，每组 20 个，交替使用预递增的归纳变量来打破依赖关系，以及一个后测试循环。同样，您可以尝试使用括号分组来针对您的编译器和平台对其进行微调。

关于c - 循环拆分使代码变慢，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37534691/

文章推荐： C - 拆分 TCHAR

文章推荐： html - 如何使文本和文本框在移动网站中显得更大

文章推荐： android - 设置 NumberPicker 默认值

文章推荐： javascript - 不注意数组长度条件的无限循环

在身份验证可以继续之前 ssh 变慢
我已经在谷歌上搜索这个问题一段时间了，但我还没有找到有效的解决方案。问题是 SSH 登录到我的服务器突然变得很慢。我可以看到身份验证需要大约 10 秒才能继续，这是我的 ssh 详细日志: Open
iOS AVPlayer 变慢
我正在使用 AVPlayer 在我的项目中播放在线视频。视频播放良好。现在我想减少/增加视频的 fps。以下是我正在使用的代码: self.asset = [AVAsset assetWithURL:
在树莓派中运行两个程序时 Python 变慢
在 Raspberry Pi 上运行两个使用 python gpio 引脚的程序时，一个变慢。一种是磁传感器，另一种是温湿度传感器。后者是放慢速度的。它不是每 2 秒打印一次温度，而是每 5 到 10
遍历集合时 Golang 变慢
我从 Redis 向我的应用程序提供一个 json，然后我对其进行解码和循环。这是我从 Redis 提供的 json 的样子: [ { "titel": "test 1",
debugging - 发送大节时 Ejabberd 变慢
Ejabberd 版本:19.9.0 在发送 OMEMO 消息时(使用 websockets)，例如
eclipse - 如何确定是什么导致 Eclipse 变慢？
我们有相当大的代码库(150 多个项目、400000 多行 Java 代码、一些 Groovy 和 Gradle 代码、一些 Perl 代码、一些 XML、大量 JSP 等)。我设法在 Spring
optimization - 使用阴影时 SVG 变慢
我在一个网站上工作，您可以在其中创建 svg 艺术品，这意味着您可以动态添加元素、缩放、颜色并移动它们。问题是，当你开始在他们身上施加阴影时，一切都会开始变慢。对于这个的现场演示，this是我正在开
debugging - 如何查看哪些插件导致 Vim 变慢？
有没有办法分析 Vim 插件？当我打开一个大的 .py 时，我的 MacVim 变得越来越慢。我知道我可以取消选择所有插件并逐一重新选择以检查哪个插件是罪魁祸首，但是有没有更快的方法？我的 dot
JavaFX ...是什么让我的 Controller 变慢？
我正在构建一个JavaFX应用程序。我知道它使用反射，并且反射可能不如我在代码中构建 UI 时那么快。所以，如何设计我的 Controller 以使由反射引起的开销尽可能小？带/不带 @FXML
ios - 滚动几次后，cellForRowAtIndexPath 变慢
我对 UITableViewCell 进行了子类化显示从 1 到 70 的数字。在每个单元格中，我都在检查中奖号码并检查他们的背景。问题是，经过几次滚动后，tableview 变得非常缓慢，甚至无法
r - 过滤任何命令时 group_by 变慢
如果我想group_by 和filter 那些在数据集中有任何NA 或factor 值的，我想在 dplyr 中使用 any 函数，但发现它对 NAs 或 factor 运行缓慢(但不是为了寻找任何数
java - 多次插入后 hibernate 变慢
我有一个问题。在我的解决方案中，我需要将数千个数据插入数据库。我正在使用批处理准备语句在一个请求中插入多行。在我调用插入几次之后， hibernate 变得更慢了。我猜它会在我提交后检查数据库是否有
ios - 加载图像使我的桌面 View 变慢
我从 json url 获取数据，但是当我想加载图像时，速度非常慢! class NewsTableViewController: UITableViewController { var id
java - 更新托管对象字段的 Realm 变慢
我有一个相当简单的托管 Realm 对象 RealmAlertItem由一些字符串和 float 组成。我有一个函数 showAlertNotification()随后被调用(从网络外部触发)并且它依
mysql 索引使 group by 变慢
请参阅下面的表格结构。 CREATE TABLE `oarc` ( `ID` bigint(20) NOT NULL AUTO_INCREMENT, `zID` int(11) NOT N
android - Intellij 变慢，通常达到最大堆大小然后提示退出
IntelliJ 慢得像爬行。键之间没有 1-2 个延迟几乎无法打字。我已经更新了堆大小。我在我的 Macbook Pro 上运行大约 2GB RAM。自从它一直在放缓。我已经增加了堆大小，但无济于事
INSERT 后 MySQL 变慢
我的 Web 应用程序遇到了性能问题。发现瓶颈是db。应用程序在具有 4 个 CPU 和 2GB RAM 的 LAMP 服务器 (VPS) 上运行。将新记录插入数据库(包含大约 100.000 条记
ios - 一段时间后 dispatch_queue_t 变慢
我有关于自定义 DispatchQueue 的问题。我创建了一个队列，并将其用作captureOutput:方法的队列。这是一个代码片段: //At the file header private
ios - 如何让 iOS 变慢
我是一名移动 QA。现在我们有一个关于网络响应和 UI 渲染之间的竞争条件的问题。我们猜测如果 UI 渲染比网络响应慢，那么它就会崩溃。我们已经尝试通过使用 Charles 的本地 map 功能来加
切换到全屏时 Javascript/jQuery 变慢
我在 firefox 中遇到了一些奇怪的行为，我正在构建一个单页作品集，作为一名平面设计师，编码一直很困难。我想平滑地控制导航，然后向所有元素添加缩放(最初设计为 1920x1080 全屏)。讲师扔了

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c - 循环拆分使代码变慢