c++ - C++ 中 SSE 的内存对齐，_aligned

c++ - C++ 中 SSE 的内存对齐，_aligned_malloc 等效？

转载作者：塔克拉玛干更新时间：2023-11-03 01:53:32

25

4

我想知道如何将此 C 代码转换为 C++ 以实现内存对齐。

float *pResult = (float*) _aligned_malloc(length * sizeof(float), 16);

我看过here然后我试了这个float *pResult = (float*) __attribute__((aligned(16)));

还有这个

float *pResult = __attribute__((aligned(16)));

但两者都给出了类似的错误。

error: expected primary-expression before '__attribute__'|
error: expected ',' or ';' before '__attribute__'|

完整代码

#include "stdafx.h"
#include <xmmintrin.h>  // Need this for SSE compiler intrinsics
#include <math.h>       // Needed for sqrt in CPU-only version
#include "stdio.h"

int main(int argc, char* argv[])
{
    printf("Starting calculation...\n");

    const int length = 64000;

    // We will be calculating Y = Sin(x) / x, for x = 1->64000

    // If you do not properly align your data for SSE instructions, you may take a huge performance hit.
    float *pResult = (float*) __attribute__((aligned(16))); // align to 16-byte for SSE
    __m128 x;
    __m128 xDelta = _mm_set1_ps(4.0f);      // Set the xDelta to (4,4,4,4)
    __m128 *pResultSSE = (__m128*) pResult;


    const int SSELength = length / 4;

    for (int stress = 0; stress < 100000; stress++) // lots of stress loops so we can easily use a stopwatch
    {
#define TIME_SSE    // Define this if you want to run with SSE
#ifdef TIME_SSE
        x = _mm_set_ps(4.0f, 3.0f, 2.0f, 1.0f); // Set the initial values of x to (4,3,2,1)
        for (int i=0; i < SSELength; i++)
        {
            __m128 xSqrt = _mm_sqrt_ps(x);
            // Note! Division is slow. It's actually faster to take the reciprocal of a number and multiply
            // Also note that Division is more accurate than taking the reciprocal and multiplying

#define USE_DIVISION_METHOD
#ifdef USE_FAST_METHOD
            __m128 xRecip = _mm_rcp_ps(x);
            pResultSSE[i] = _mm_mul_ps(xRecip, xSqrt);
#endif //USE_FAST_METHOD
#ifdef USE_DIVISION_METHOD
            pResultSSE[i] = _mm_div_ps(xSqrt, x);
#endif  // USE_DIVISION_METHOD

            // NOTE! Sometimes, the order in which things are done in SSE may seem reversed.
            // When the command above executes, the four floating elements are actually flipped around
            // We have already compensated for that flipping by setting the initial x vector to (4,3,2,1) instead of (1,2,3,4)

            x = _mm_add_ps(x, xDelta);  // Advance x to the next set of numbers
        }
#endif  // TIME_SSE
#ifndef TIME_SSE
        float xFloat = 1.0f;
        for (int i=0 ; i < length; i++)
        {
            pResult[i] = sqrt(xFloat) / xFloat; // Even though division is slow, there are no intrinsic functions like there are in SSE
            xFloat += 1.0f;
        }
#endif  // !TIME_SSE
    }

    // To prove that the program actually worked
    for (int i=0; i < 20; i++)
    {
        printf("Result[%d] = %f\n", i, pResult[i]);
    }

    // Results for my particular system
    // 23.75 seconds for SSE with reciprocal/multiplication method
    // 38.5 seconds for SSE with division method
    // 301.5 seconds for CPU

    return 0;
}

最佳答案

对于 C++11，您可以使用类似的东西:

struct aligned_float
{
    alignas(16) float f[4];
};

static_assert(sizeof(aligned_float) == 4 * sizeof(float), "padding issue");

int main()
{
    const int length = 64000;
    std::vector<aligned_float> pResult(length / sizeof(aligned_float));

    return 0;
}

关于c++ - C++ 中 SSE 的内存对齐，_aligned_malloc 等效？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23183628/

25

4

0

文章推荐： c - 使用 fscanf() 在 C 中以科学格式读取数字时出现问题

文章推荐： c++ - 向 Code::Blocks 添加链接器选项

文章推荐： linux - 在每一行的开头搜索

文章推荐： c++ - 错误 : Use of class template requires template argument list

html - 对齐/对齐 li 网格元素
我希望能够像在 jsFiddle 中那样将元素列表对齐到右侧的复选框。这是如何做到这一点的最佳实践？传统上我从来没有 float 过相互嵌套的元素，所以我想确定这是否是解决此问题的正确方法。代码(h
assembly - MIPS assembly 对齐对齐 n
指令.align n是什么意思在数组中做什么？更具体地说，假设我有以下部分代码: array: .align 2 .space 800 它的重要性是什么，为什么不跳过它并使用
c - 强制 2 函数对齐的幂 s.t 对齐 % 一些其他值 == 对齐
基本上我正在寻找一种强制特定相对对齐的方法即我想保证其他一些值(value) m s.t m > n alignment_of(foo) % 2^m == 2^n IE: .align 2^m; .
多维数组结构的C++对齐
在我的代码中，我必须考虑一个数组数组，其中内部数组具有固定维度。为了使用 STL 算法，将数据实际存储为数组的数组很有用，但我还需要将该数据传递给 C 库，该库采用扁平化的 C 样式数组。如果能够以
BlackBerry HorizontalFieldManager 对齐
横向上，我想显示两个位图，并在它们之间显示一个标签字段。代码看起来很简单，但所有字段都添加在屏幕左侧。 HorizontalFieldManager hfm = new HorizontalFiel
轴标签与表达式的 R 对齐
我想绘制一个变量名称及其符号。因为某些变量的名称很长，所以我试图将换行符与轴标签混合使用。这会导致对齐中发生有趣的事情: par(mar=c(1,12,1,1)) plot( y=1:6, 1:6,
r - 将左两行图例标题与expression()对齐
使用这个脚本 df <- data.frame(x = 1:5, y = 1:5, color = letters[1:5]) ggplot(df, aes(x, y, fill = color))
matlab - 在matlab中从结构创建表 - 对齐
我有一个带有标量字段的结构，比如妈妈，我想在屏幕上对齐的列中显示结构的值，可能还有一些标题。这是一个最小的工作示例: mom.a = 1; mom.b = 2; mom.veryLongName =
ios 自动布局视觉格式 - 对齐
在 iOS6 中，我使用自动布局。我有 2 个以编程方式创建的 View v1 和 v2。 v2 作为 subview 添加到 v1 v1 的约束已通过编程方式创建(此处未显示)。我希望 v1 和
C++ 对齐 new[]
概述浏览时operator new, operator new[] - cppreference.com ，似乎我们有许多选项来分配具有特定对齐要求的对象数组。但是，没有指定如何使用它们，而且我似乎
flutter - 对两个文本小部件仅使用一次“对齐”
Widget _createProfileContainer() { return new Container( height: 64.0, child: ne
javascript - Bootstrap 对齐
我正在使用 Bootstrap 和语义 UI 的组合来设计和对齐我的网页。目前，我在将页面 api map 和博客文章在整个页面上对齐时遇到问题，而不是像图像所示那样堆叠在一起。这是我的底层代码，
Java GUI 对齐
所以我已经添加了标签和所有内容，但我仍然在格式化和对齐所有内容时遇到问题。计算按钮显然应该居中。我知道使用 gridbag 将框架分割成坐标系，当一列大于其他列时，它会调整其他列并将其抛弃(对吗？)。
java - 对齐 JButton
我必须将程序上的按钮对齐到中间，我运行的当前代码但显示的按钮与程序一样大，我想要一个特定大小的中心按钮，这是我尝试过的 /** * Created by Timk9 on 11/04/2016.
VIM 格式化/对齐
我正在尝试将 VIM 作为我的 ruby/rails 编辑器。太胖了，我对它的功能印象深刻并且我能够安装以下插件以提供更好的 IDE 体验自动配对 Better-snipmate-snippe
c++ - 对齐/偏移结构的特定成员
在结构内对齐成员的最佳或常规方法是什么？添加虚拟数组是最佳解决方案吗？我有一个 double 的结构和 double 的三倍是吗？ struct particle{ double mass;
C++ iomanip 对齐
我正在尝试对齐我的输出，但由于某种原因我无法做到我多么想要它，这真的很令人沮丧。标题不会正确对齐。我不知道我是否正确使用了 setw()。 #include using std::cout; usi
android - 对齐 TextView
我正在开发一个 android 应用程序，其相对布局如下所示。这是应用程序在屏幕上的显示方式的 imgur 链接:http://imgur.com/c4rNJ .我希望“Text 1”出现在“a l
java - 对齐 JButton
我不确定为什么我不能在下面的代码中调整按钮的位置。我有几行设置了边界，但我一定遗漏了一些东西。 public DayGUI() { mainFrame = new JF
html - 对齐 iframe
我有一个 html 页面，我想在页面底部对齐一个 iframe，使 iframe 占据所有宽度，我无法在底部对齐 iframe。请找到底部的 iframe 标签页面。 The rest of th

首页

博学

6Ren·AI

商城

c++ - C++ 中 SSE 的内存对齐，_aligned_malloc 等效？