gpt4 book ai didi

c++ - 转置 16 字数组的最快方法

转载 作者:行者123 更新时间:2023-11-30 03:39:37 26 4
gpt4 key购买 nike

我有以下代码:

void shuffle_words(WORD_TYPE* _state) 
{
WORD_TYPE temp[DATA_SIZE];

temp[7] = _state[0];
temp[12] = _state[1];
temp[14] = _state[2];
temp[9] = _state[3];
temp[2] = _state[4];
temp[1] = _state[5];
temp[5] = _state[6];
temp[15] = _state[7];
temp[11] = _state[8];
temp[6] = _state[9];
temp[13] = _state[10];
temp[0] = _state[11];
temp[4] = _state[12];
temp[8] = _state[13];
temp[10] = _state[14];
temp[3] = _state[15];

memcpy_s(_state, temp, DATA_SIZE * WORD_SIZE);
}


int prp(WORD_TYPE* data, WORD_TYPE key)
{
shuffle_words(data);
key = round_function<14, 15>(data, key);
key = round_function<13, 14>(data, key);
key = round_function<12, 13>(data, key);
key = round_function<11, 12>(data, key);
key = round_function<10, 11>(data, key);
key = round_function<9, 10>(data, key);
key = round_function<8, 9>(data, key);
key = round_function<7, 8>(data, key);
key = round_function<6, 7>(data, key);
key = round_function<5, 6>(data, key);
key = round_function<4, 5>(data, key);
key = round_function<3, 4>(data, key);
key = round_function<2, 3>(data, key);
key = round_function<1, 2>(data, key);
key = round_function<0, 1>(data, key);
key = round_function<15, 0>(data, key);
return key;
}

我想知道是否有更快的方法来执行 shuffle_words 操作。我看到过有关矩阵转置的问题,但这些问题似乎集中在矩阵较大或多维的情况下。

我的数组大小始终为 16 个字,prp 函数将在同一个数组上多次应用,一个接一个。这让我相信简单地访问转置顺序中的元素而不实际转置它们是一种选择。

round_function 已经将数据写入数组,如果将 shuffle 移动到数组中会更有效,那也是可以接受的。这是相关代码,以备不时之需:

template <int left_index, int right_index> 
WORD_TYPE round_function(WORD_TYPE* state, WORD_TYPE key)
{
WORD_TYPE left, right;
left = state[left_index];
right = state[right_index];

key ^= right;
right = rotate_left<ROTATION_AMOUNT>(right + key + left_index);
key ^= right;

key ^= left;
left += right >> (BIT_WIDTH / 2);
left ^= rotate_left<(left_index % BIT_WIDTH) ^ ROTATION_AMOUNT>(right);
key ^= left;

state[left_index] = left;
state[right_index] = right;
return key;
}

我想过为 round_function 提供目标索引,但这样做会覆盖尚未操作的字节,从而破坏目标索引处的数据。

执行单词转置步骤的最有效方法是什么?是否可以在没有临时存储和memcpy 的情况下高效地执行 shuffle_words?如果我保持原样,编译器会为我优化它吗?

编辑:

对于 16 个空词的示例输入,我得到以下输出:

5390936987981438580
7289498000187791405
11630888819098945478
4862561973623181657
11364775727483781365
1302861686580238483
10934483497681452460
376472396741801
17443576244438476890
17213444377027086447
15287741771379858051
16772715748200046576
6216997191100954620
16389751604649919423
2033403819063771136
14517213842436349075

我使用了这些#defines:

#define ROTATION_AMOUNT 41
#define BIT_WIDTH 64
#define DATA_SIZE 16
typedef unsigned long long WORD_TYPE;

如果可以提高效率,我可以稍微修改功能。

最佳答案

是的!

void shuffle_words(WORD_TYPE* _state) {

WORD_TYPE temp = _state[0];

_state[0] = _state[11];
_state[11] = _state[8];
_state[8] = _state[13];
_state[13] = _state[10];
_state[10] = _state[14];
_state[14] = _state[2];
_state[2] = _state[4];
_state[4] = _state[12];
_state[12] = _state[1];
_state[1] = _state[5];
_state[5] = _state[6];
_state[6] = _state[9];
_state[9] = _state[3];
_state[3] = _state[15];
_state[15] = _state[7];
_state[7] = temp;
}

关于c++ - 转置 16 字数组的最快方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38645777/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com