gpt4 book ai didi

c++ - 极其频繁地调用 std::nth_element() 函数

转载 作者:IT老高 更新时间:2023-10-28 22:27:09 28 4
gpt4 key购买 nike

我在任何地方都没有找到这个特定的主题...

我在 23 个整数的 std::vector 中调用 nth_element() 算法大约 400,000 次,更精确的“无符号短”值。

我想提高计算速度,而这个特定的调用需要大量的 CPU 时间。现在我注意到,与 std::sort() 一样,即使在最高优化级别和 NDEBUG 模式(Linux Clang 编译器)下,nth_element 函数在分析器中也是可见的,因此比较是内联的,而不是函数调用本身。好吧,更确切地说:不是 nth_element() 而是 std::__introselect() 是可见的。

由于数据量小,我尝试使用二次排序函数PIKSORT,当数据量小于20个元素时,往往比调用std::sort要快,可能是因为函数会内联.

template <class CONTAINER>
inline void piksort(CONTAINER& arr) // indeed this is "insertion sort"
{
typename CONTAINER::value_type a;

const int n = (int)arr.size();
for (int j = 1; j<n; ++j) {
a = arr[j];
int i = j;
while (i > 0 && a < arr[i - 1]) {
arr[i] = arr[i - 1];
i--;
}
arr[i] = a;
}
}

但是在这种情况下,这比使用 nth_element 慢。

另外,使用统计方法也不合适,Something faster than std::nth_element

最后,由于值在 0 到大约 20000 的范围内,因此直方图方法看起来不合适。

我的问题:有人知道一个简单的解决方案吗?我想我可能不是唯一一个必须经常调用 std::sort 或 nth_element 的人。

最佳答案

您提到数组的大小总是已知为 23 .此外,使用的类型是unsigned short .在这种情况下,您可以尝试使用大小为 23 的排序网络。 ;因为你的类型是 unsigned short ,使用排序网络对整个数组进行排序可能比使用 std::nth_element 部分排序更快。 .这是大小为 23 的排序网络的一个非常简单的 C++14 实现。与 118比较交换单元,如 Using Symmetry and Evolutionary Search to Minimize Sorting Networks 所述:

template<typename RandomIt, typename Compare = std::less<>>
void network_sort23(RandomIt first, Compare compare={})
{
swap_if(first[1u], first[20u], compare);
swap_if(first[2u], first[21u], compare);
swap_if(first[5u], first[13u], compare);
swap_if(first[9u], first[17u], compare);
swap_if(first[0u], first[7u], compare);
swap_if(first[15u], first[22u], compare);
swap_if(first[4u], first[11u], compare);
swap_if(first[6u], first[12u], compare);
swap_if(first[10u], first[16u], compare);
swap_if(first[8u], first[18u], compare);
swap_if(first[14u], first[19u], compare);
swap_if(first[3u], first[8u], compare);
swap_if(first[4u], first[14u], compare);
swap_if(first[11u], first[18u], compare);
swap_if(first[2u], first[6u], compare);
swap_if(first[16u], first[20u], compare);
swap_if(first[0u], first[9u], compare);
swap_if(first[13u], first[22u], compare);
swap_if(first[5u], first[15u], compare);
swap_if(first[7u], first[17u], compare);
swap_if(first[1u], first[10u], compare);
swap_if(first[12u], first[21u], compare);
swap_if(first[8u], first[19u], compare);
swap_if(first[17u], first[22u], compare);
swap_if(first[0u], first[5u], compare);
swap_if(first[20u], first[21u], compare);
swap_if(first[1u], first[2u], compare);
swap_if(first[18u], first[19u], compare);
swap_if(first[3u], first[4u], compare);
swap_if(first[21u], first[22u], compare);
swap_if(first[0u], first[1u], compare);
swap_if(first[19u], first[22u], compare);
swap_if(first[0u], first[3u], compare);
swap_if(first[12u], first[13u], compare);
swap_if(first[9u], first[10u], compare);
swap_if(first[6u], first[15u], compare);
swap_if(first[7u], first[16u], compare);
swap_if(first[8u], first[11u], compare);
swap_if(first[11u], first[14u], compare);
swap_if(first[4u], first[11u], compare);
swap_if(first[6u], first[8u], compare);
swap_if(first[14u], first[16u], compare);
swap_if(first[17u], first[20u], compare);
swap_if(first[2u], first[5u], compare);
swap_if(first[9u], first[12u], compare);
swap_if(first[10u], first[13u], compare);
swap_if(first[15u], first[18u], compare);
swap_if(first[10u], first[11u], compare);
swap_if(first[4u], first[7u], compare);
swap_if(first[20u], first[21u], compare);
swap_if(first[1u], first[2u], compare);
swap_if(first[7u], first[15u], compare);
swap_if(first[3u], first[9u], compare);
swap_if(first[13u], first[19u], compare);
swap_if(first[16u], first[18u], compare);
swap_if(first[8u], first[14u], compare);
swap_if(first[4u], first[6u], compare);
swap_if(first[18u], first[21u], compare);
swap_if(first[1u], first[4u], compare);
swap_if(first[19u], first[21u], compare);
swap_if(first[1u], first[3u], compare);
swap_if(first[9u], first[10u], compare);
swap_if(first[11u], first[13u], compare);
swap_if(first[2u], first[6u], compare);
swap_if(first[16u], first[20u], compare);
swap_if(first[4u], first[9u], compare);
swap_if(first[13u], first[18u], compare);
swap_if(first[19u], first[20u], compare);
swap_if(first[2u], first[3u], compare);
swap_if(first[18u], first[20u], compare);
swap_if(first[2u], first[4u], compare);
swap_if(first[5u], first[17u], compare);
swap_if(first[12u], first[14u], compare);
swap_if(first[8u], first[12u], compare);
swap_if(first[5u], first[7u], compare);
swap_if(first[15u], first[17u], compare);
swap_if(first[5u], first[8u], compare);
swap_if(first[14u], first[17u], compare);
swap_if(first[3u], first[5u], compare);
swap_if(first[17u], first[19u], compare);
swap_if(first[3u], first[4u], compare);
swap_if(first[18u], first[19u], compare);
swap_if(first[6u], first[10u], compare);
swap_if(first[11u], first[16u], compare);
swap_if(first[13u], first[16u], compare);
swap_if(first[6u], first[9u], compare);
swap_if(first[16u], first[17u], compare);
swap_if(first[5u], first[6u], compare);
swap_if(first[4u], first[5u], compare);
swap_if(first[7u], first[9u], compare);
swap_if(first[17u], first[18u], compare);
swap_if(first[12u], first[15u], compare);
swap_if(first[14u], first[15u], compare);
swap_if(first[8u], first[12u], compare);
swap_if(first[7u], first[8u], compare);
swap_if(first[13u], first[15u], compare);
swap_if(first[15u], first[17u], compare);
swap_if(first[5u], first[7u], compare);
swap_if(first[9u], first[10u], compare);
swap_if(first[10u], first[14u], compare);
swap_if(first[6u], first[11u], compare);
swap_if(first[14u], first[16u], compare);
swap_if(first[15u], first[16u], compare);
swap_if(first[6u], first[7u], compare);
swap_if(first[10u], first[11u], compare);
swap_if(first[9u], first[12u], compare);
swap_if(first[11u], first[13u], compare);
swap_if(first[13u], first[14u], compare);
swap_if(first[8u], first[9u], compare);
swap_if(first[7u], first[8u], compare);
swap_if(first[14u], first[15u], compare);
swap_if(first[9u], first[10u], compare);
swap_if(first[8u], first[9u], compare);
swap_if(first[12u], first[14u], compare);
swap_if(first[11u], first[12u], compare);
swap_if(first[12u], first[13u], compare);
swap_if(first[10u], first[11u], compare);
swap_if(first[11u], first[12u], compare);
}

swap_if实用函数比较两个参数 xy带有谓词 compare如果 compare(y, x) 则交换它们.我的示例使用通用 swap_if函数,但如果您知道要比较 unsigned short,则可以使用优化版本operator< 的值无论如何(如果您的编译器识别并优化比较交换,您可能不需要这样的函数,但不幸的是,并非所有编译器都这样做 - 我正在使用带有 -O3 的 g++5.2,我仍然需要以下函数来提高性能) :

void swap_if(unsigned short& x, unsigned short& y)
{
unsigned short dx = x;
unsigned short dy = y;
unsigned short tmp = x = std::min(dx, dy);
y ^= dx ^ tmp;
}

现在,为了确保它确实更快,我决定计时 std::nth_element当需要对 仅前 10 个元素进行部分排序 与使用排序网络对整个 23 个元素进行排序时(使用不同的洗牌数组进行 1000000 次)。这是我得到的:

std::nth_element    1158ms
network_sort23 487ms

也就是说,我的电脑已经运行了一段时间,速度有点慢,但性能差异很明显。我相信当我重新启动计算机时,这种差异将保持不变。稍后我可能会尝试并通知您。

关于这些时间是如何产生的,我使用了 this benchmark 的修改版本。来 self 的cpp-sort library .原始排序网络和swap_if函数也来自那里,因此您可以确定它们已经过多次测试:)

编辑:这是我重新启动计算机后的结果。 network_sort23版本仍然比 std::nth_element 快两倍:

std::nth_element    369ms
network_sort23 154ms

EDIT²:如果您只需要中位数,您可以简单地删除计算将位于第 11 位的最终值不需要的比较交换单元。生成的大小为 23 的中值查找网络使用与前一个不同的大小为 23 的排序网络,它产生的结果略好:

swap_if(first[0u], first[1u], compare);
swap_if(first[2u], first[3u], compare);
swap_if(first[4u], first[5u], compare);
swap_if(first[6u], first[7u], compare);
swap_if(first[8u], first[9u], compare);
swap_if(first[10u], first[11u], compare);
swap_if(first[1u], first[3u], compare);
swap_if(first[5u], first[7u], compare);
swap_if(first[9u], first[11u], compare);
swap_if(first[0u], first[2u], compare);
swap_if(first[4u], first[6u], compare);
swap_if(first[8u], first[10u], compare);
swap_if(first[1u], first[2u], compare);
swap_if(first[5u], first[6u], compare);
swap_if(first[9u], first[10u], compare);
swap_if(first[1u], first[5u], compare);
swap_if(first[6u], first[10u], compare);
swap_if(first[5u], first[9u], compare);
swap_if(first[2u], first[6u], compare);
swap_if(first[1u], first[5u], compare);
swap_if(first[6u], first[10u], compare);
swap_if(first[0u], first[4u], compare);
swap_if(first[7u], first[11u], compare);
swap_if(first[3u], first[7u], compare);
swap_if(first[4u], first[8u], compare);
swap_if(first[0u], first[4u], compare);
swap_if(first[7u], first[11u], compare);
swap_if(first[1u], first[4u], compare);
swap_if(first[7u], first[10u], compare);
swap_if(first[3u], first[8u], compare);
swap_if(first[2u], first[3u], compare);
swap_if(first[8u], first[9u], compare);
swap_if(first[2u], first[4u], compare);
swap_if(first[7u], first[9u], compare);
swap_if(first[3u], first[5u], compare);
swap_if(first[6u], first[8u], compare);
swap_if(first[3u], first[4u], compare);
swap_if(first[5u], first[6u], compare);
swap_if(first[7u], first[8u], compare);
swap_if(first[12u], first[13u], compare);
swap_if(first[14u], first[15u], compare);
swap_if(first[16u], first[17u], compare);
swap_if(first[18u], first[19u], compare);
swap_if(first[20u], first[21u], compare);
swap_if(first[13u], first[15u], compare);
swap_if(first[17u], first[19u], compare);
swap_if(first[12u], first[14u], compare);
swap_if(first[16u], first[18u], compare);
swap_if(first[20u], first[22u], compare);
swap_if(first[13u], first[14u], compare);
swap_if(first[17u], first[18u], compare);
swap_if(first[21u], first[22u], compare);
swap_if(first[13u], first[17u], compare);
swap_if(first[18u], first[22u], compare);
swap_if(first[17u], first[21u], compare);
swap_if(first[14u], first[18u], compare);
swap_if(first[13u], first[17u], compare);
swap_if(first[18u], first[22u], compare);
swap_if(first[12u], first[16u], compare);
swap_if(first[15u], first[19u], compare);
swap_if(first[16u], first[20u], compare);
swap_if(first[12u], first[16u], compare);
swap_if(first[13u], first[16u], compare);
swap_if(first[19u], first[22u], compare);
swap_if(first[15u], first[20u], compare);
swap_if(first[14u], first[15u], compare);
swap_if(first[20u], first[21u], compare);
swap_if(first[14u], first[16u], compare);
swap_if(first[19u], first[21u], compare);
swap_if(first[15u], first[17u], compare);
swap_if(first[18u], first[20u], compare);
swap_if(first[15u], first[16u], compare);
swap_if(first[17u], first[18u], compare);
swap_if(first[19u], first[20u], compare);
swap_if(first[0u], first[12u], compare);
swap_if(first[2u], first[14u], compare);
swap_if(first[4u], first[16u], compare);
swap_if(first[6u], first[18u], compare);
swap_if(first[8u], first[20u], compare);
swap_if(first[10u], first[22u], compare);
swap_if(first[2u], first[12u], compare);
swap_if(first[10u], first[20u], compare);
swap_if(first[4u], first[12u], compare);
swap_if(first[6u], first[14u], compare);
swap_if(first[8u], first[16u], compare);
swap_if(first[10u], first[18u], compare);
swap_if(first[8u], first[12u], compare);
swap_if(first[10u], first[14u], compare);
swap_if(first[10u], first[12u], compare);
swap_if(first[1u], first[13u], compare);
swap_if(first[3u], first[15u], compare);
swap_if(first[5u], first[17u], compare);
swap_if(first[7u], first[19u], compare);
swap_if(first[9u], first[21u], compare);
swap_if(first[3u], first[13u], compare);
swap_if(first[11u], first[21u], compare);
swap_if(first[5u], first[13u], compare);
swap_if(first[7u], first[15u], compare);
swap_if(first[9u], first[17u], compare);
swap_if(first[11u], first[19u], compare);
swap_if(first[9u], first[13u], compare);
swap_if(first[11u], first[15u], compare);
swap_if(first[11u], first[13u], compare);
swap_if(first[11u], first[12u], compare);

可能有更聪明的方法来生成中值查找网络,但我认为尚未对这个主题进行过广泛的研究。因此,这可能是您目前可以使用的最佳方法。结果并不好,但它仍然使用 104 个比较交换单元而不是 118 个。

关于c++ - 极其频繁地调用 std::nth_element() 函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33307957/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com