sorting - 并行基数排序，这个实现实际上是如何工作的？有一些启发式方法吗？-6ren

sorting - 并行基数排序，这个实现实际上是如何工作的？有一些启发式方法吗？

转载作者：行者123 更新时间：2023-12-04 04:46:16

我正在为他们的并行编程类(class)进行 Udacity 测验。我很纠结于我应该如何开始作业，因为我不确定我是否理解正确。

对于赋值(在代码中)，我们给出了两个数组和值数组和一个位置数组。我们应该使用并行基数排序对值数组进行排序，同时正确设置位置。

我完全理解基数排序及其工作原理。我不明白的是他们希望我们如何实现它。这是开始分配的模板

//Udacity HW 4
//Radix Sorting

#include "reference_calc.cpp"
#include "utils.h"

/* Red Eye Removal
   ===============

   For this assignment we are implementing red eye removal.  This is
   accomplished by first creating a score for every pixel that tells us how
   likely it is to be a red eye pixel.  We have already done this for you - you
   are receiving the scores and need to sort them in ascending order so that we
   know which pixels to alter to remove the red eye.

   Note: ascending order == smallest to largest

   Each score is associated with a position, when you sort the scores, you must
   also move the positions accordingly.

   Implementing Parallel Radix Sort with CUDA
   ==========================================

   The basic idea is to construct a histogram on each pass of how many of each
   "digit" there are.   Then we scan this histogram so that we know where to put
   the output of each digit.  For example, the first 1 must come after all the
   0s so we have to know how many 0s there are to be able to start moving 1s
   into the correct position.

   1) Histogram of the number of occurrences of each digit
   2) Exclusive Prefix Sum of Histogram
   3) Determine relative offset of each digit
        For example [0 0 1 1 0 0 1]
                ->  [0 1 0 1 2 3 2]
   4) Combine the results of steps 2 & 3 to determine the final
      output location for each element and move it there

   LSB Radix sort is an out-of-place sort and you will need to ping-pong values
   between the input and output buffers we have provided.  Make sure the final
   sorted results end up in the output buffer!  Hint: You may need to do a copy
   at the end.

 */


void your_sort(unsigned int* const d_inputVals,
               unsigned int* const d_inputPos,
               unsigned int* const d_outputVals,
               unsigned int* const d_outputPos,
               const size_t numElems)
{

}

我特别不明白这 4 个步骤如何最终对数组进行排序。

所以第一步，我应该创建一个“数字”的直方图(为什么用引号......？)。因此，给定输入值 n 我需要将 0 和 1 的计数转换为直方图。那么，步骤 1 是否应该创建一个直方图数组，每个输入值一个？

好吧，对于其余的步骤，它很快就会崩溃。有人可以告诉我这些步骤应该如何实现基数排序吗？

最佳答案

基数排序背后的基本思想是，我们将考虑将每个元素逐位排序，从最不重要到最重要。对于每个数字，我们将移动元素，使这些数字按递增顺序排列。

让我们举一个非常简单的例子。让我们对四个数量进行排序，每个数量都有 4 个二进制数字。让我们选择 1、4、7 和 14。我们将它们混合在一起，并可视化二进制表示:

Element #    1       2       3       4
Value:       7       14      4       1
Binary:      0111    1110    0100    0001

首先我们将考虑位 0:

Element #    1       2       3       4
Value:       7       14      4       1
Binary:      0111    1110    0100    0001
bit 0:       1       0       0       1

现在基数排序算法说我们必须以这样的方式移动元素(仅考虑位 0)所有零都在左侧，所有零都在右侧。让我们这样做，同时使用零位保留元素的顺序并使用一位保留元素的顺序。我们可以这样做:

Element #    2       3       1       4
Value:       14      4       7       1
Binary:      1110    0100    0111    0001
bit 0:       0       0       1       1

我们的基数排序的第一步已经完成。下一步是考虑下一个(二进制)数字:

Element #    3       2       1       4
Value:       4       14      7       1
Binary:      0100    1110    0111    0001
bit 1:       0       1       1       0

再一次，我们必须移动元素，以便相关数字(位 1)按升序排列:

Element #    3       4       2       1
Value:       4       1       14      7
Binary:      0100    0001    1110    0111
bit 1:       0       0       1       1

现在我们必须移动到下一个更高的数字:

Element #    3       4       2       1
Value:       4       1       14      7
Binary:      0100    0001    1110    0111
bit 2:       1       0       1       1

并再次移动它们:

Element #    4       3       2       1
Value:       1       4       14      7
Binary:      0001    0100    1110    0111
bit 2:       0       1       1       1

现在我们移到最后(最高位)数字:

Element #    4       3       2       1
Value:       1       4       14      7
Binary:      0001    0100    1110    0111
bit 3:       0       0       1       0

并做我们的最后一步:

Element #    4       3       1       2
Value:       1       4       7       14
Binary:      0001    0100    0111    1110
bit 3:       0       0       0       1

现在对值进行排序。希望这看起来很清楚，但在到目前为止的描述中，我们已经掩盖了诸如“我们如何知道要移动哪些元素？”之类的细节。以及“我们怎么知道把它们放在哪里？”因此，让我们重复我们的示例，但我们将使用提示中建议的具体方法和顺序来回答这些问题。从位 0 开始:

Element #    1       2       3       4
Value:       7       14      4       1
Binary:      0111    1110    0100    0001
bit 0:       1       0       0       1

首先让我们构建一个第 0 位位置的零位数和第 0 位位置的 1 位数的直方图:

bit 0:       1       0       0       1

              zero bits       one bits
              ---------       --------
1)histogram:         2              2

现在让我们对这些直方图值进行独占前缀和:

              zero bits       one bits
              ---------       --------
1)histogram:         2              2
2)prefix sum:        0              2

独占前缀和只是所有先前值的总和。第一个位置没有前面的值，第二个位置的前面的值是 2(位 0 位置中具有 0 位的元素数)。现在，作为一个独立的操作，让我们确定所有零位中每个 0 位的相对偏移量，以及所有 1 位中的每个位的相对偏移量:

bit 0:       1       0       0       1
3)offset:    0       0       1       1

这实际上可以再次使用独占前缀和以编程方式完成，分别考虑 0-group 和 1-group，并将每个位置视为它的值为 1:

0 bit 0:             1       1       
3)ex. psum:          0       1    

1 bit 0:     1                        1      
3)ex. psum:  0                        1

现在，给定算法的第 4 步说:

4) Combine the results of steps 2 & 3 to determine the final output location for each element and move it there

这意味着，对于每个元素，我们将选择与其位值(0 或 1)相对应的 histogram-bin 前缀总和值，并加上与其位置相关的偏移量，以确定将该元素移动到的位置:

Element #    1       2       3       4
Value:       7       14      4       1
Binary:      0111    1110    0100    0001
bit 0:       1       0       0       1
hist psum:   2       0       0       2
offset:      0       0       1       1
new index:   2       0       1       3

将每个元素移动到它的“新索引”位置，我们有:

Element #    2       3       1       4
Value:       14      4       7       1
Binary:      0111    1110    0111    0001

根据之前的演练，这正是我们期望完成第一个数字移动的结果。这已经完成了第 1 步，即第一个(最低有效)数字；我们仍然有剩余的数字要处理，在每一步创建一个新的直方图和新的前缀和。

笔记:

基数排序，即使在计算机中，也不必严格基于二进制数字来完成。可以使用不同大小的数字构建类似的算法，可能由 2、3 或 4 位组成。

我们可以对基数排序执行的优化之一是仅根据实际有意义的位数进行排序。例如，如果我们以 32 位值存储数量，但我们知道存在的最大数量是 1023 (2^10-1)，我们不需要对所有 32 位进行排序。在处理前 10 位之后，我们可以停下来，期待正确的排序。

这些与 GPU 有什么关系？就上述描述而言，并不多。实际应用是考虑对直方图、前缀和和数据移动等事物使用并行算法。基数排序的这种分解允许人们定位和使用已经为这些更基本操作开发的并行算法，以构建快速并行排序。

下面是一个有效的例子。这可能有助于您理解基数排序。我认为这对您的分配没有帮助，因为此示例在扭曲级别执行 32 位基数排序，对于单个扭曲，即。为 32 数量。但从理解的角度来看，一个可能的优势是，利用各种 CUDA 内在函数，只需几条指令就可以在扭曲级别完成诸如直方图和前缀和之类的事情。对于您的作业，您将无法使用这些技术，并且您将需要提出可以在任意数据集大小上操作的全功能并行前缀和、直方图等。

#include <stdio.h>
#include <stdlib.h>
#define WSIZE 32
#define LOOPS 100000
#define UPPER_BIT 31
#define LOWER_BIT 0

__device__ unsigned int ddata[WSIZE];

// naive warp-level bitwise radix sort

__global__ void mykernel(){
  __shared__ volatile unsigned int sdata[WSIZE*2];
  // load from global into shared variable
  sdata[threadIdx.x] = ddata[threadIdx.x];
  unsigned int bitmask = 1<<LOWER_BIT;
  unsigned int offset  = 0;
  unsigned int thrmask = 0xFFFFFFFFU << threadIdx.x;
  unsigned int mypos;
  //  for each LSB to MSB
  for (int i = LOWER_BIT; i <= UPPER_BIT; i++){
    unsigned int mydata = sdata[((WSIZE-1)-threadIdx.x)+offset];
    unsigned int mybit  = mydata&bitmask;
    // get population of ones and zeroes (cc 2.0 ballot)
    unsigned int ones = __ballot(mybit); // cc 2.0
    unsigned int zeroes = ~ones;
    offset ^= WSIZE; // switch ping-pong buffers
    // do zeroes, then ones
    if (!mybit) // threads with a zero bit
      // get my position in ping-pong buffer
      mypos = __popc(zeroes&thrmask);
    else        // threads with a one bit
      // get my position in ping-pong buffer
      mypos = __popc(zeroes)+__popc(ones&thrmask);
    // move to buffer  (or use shfl for cc 3.0)
    sdata[mypos-1+offset] = mydata;
    // repeat for next bit
    bitmask <<= 1;
    }
  // save results to global
  ddata[threadIdx.x] = sdata[threadIdx.x+offset];
  }

int main(){

  unsigned int hdata[WSIZE];
  for (int lcount = 0; lcount < LOOPS; lcount++){
    unsigned int range = 1U<<UPPER_BIT;
    for (int i = 0; i < WSIZE; i++) hdata[i] = rand()%range;
    cudaMemcpyToSymbol(ddata, hdata, WSIZE*sizeof(unsigned int));
    mykernel<<<1, WSIZE>>>();
    cudaMemcpyFromSymbol(hdata, ddata, WSIZE*sizeof(unsigned int));
    for (int i = 0; i < WSIZE-1; i++) if (hdata[i] > hdata[i+1]) {printf("sort error at loop %d, hdata[%d] = %d, hdata[%d] = %d\n", lcount,i, hdata[i],i+1, hdata[i+1]); return 1;}
    // printf("sorted data:\n");
    //for (int i = 0; i < WSIZE; i++) printf("%u\n", hdata[i]);
    }
  printf("Success!\n");
  return 0;
}

关于sorting - 并行基数排序，这个实现实际上是如何工作的？有一些启发式方法吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26206544/

文章推荐： autoprefixer - 最后 2 个版本 autoprefixer browserslist

文章推荐： html5-canvas - Kineticjs重叠形状鼠标悬停检测

Ruby 方法() 方法
我想了解 Ruby 方法 methods() 是如何工作的。我尝试使用“ruby 方法”在 Google 上搜索，但这不是我需要的。我也看过 ruby-doc.org，但我没有找到这种方法。
VBS教程：方法-Test 方法
Test 方法对指定的字符串执行一个正则表达式搜索，并返回一个 Boolean 值指示是否找到匹配的模式。 object.Test(string) 参数 object 必选项。总是一个
VBS教程：方法-Replace 方法
Replace 方法替换在正则表达式查找中找到的文本。 object.Replace(string1, string2) 参数 object 必选项。总是一个 RegExp 对象的名称。
VBS教程：方法-Raise 方法
Raise 方法生成运行时错误 object.Raise(number, source, description, helpfile, helpcontext) 参数 object 应为
VBS教程：方法-Execute 方法
Execute 方法对指定的字符串执行正则表达式搜索。 object.Execute(string) 参数 object 必选项。总是一个 RegExp 对象的名称。 string
VBS教程：方法-Clear 方法
Clear 方法清除 Err 对象的所有属性设置。 object.Clear object 应为 Err 对象的名称。说明在错误处理后，使用 Clear 显式地清除 Err 对象。此
VBS教程：方法-CopyFile 方法
CopyFile 方法将一个或多个文件从某位置复制到另一位置。 object.CopyFile source, destination[, overwrite] 参数 object 必选
VBS教程：方法-Copy 方法
Copy 方法将指定的文件或文件夹从某位置复制到另一位置。 object.Copy destination[, overwrite] 参数 object 必选项。应为 File 或 F
VBS教程：方法-Close 方法
Close 方法关闭打开的 TextStream 文件。 object.Close object 应为 TextStream 对象的名称。说明下面例子举例说明如何使用 Close 方
VBS教程：方法-BuildPath 方法
BuildPath 方法向现有路径后添加名称。 object.BuildPath(path, name) 参数 object 必选项。应为 FileSystemObject 对象的名称
VBS教程：方法-GetFolder 方法
GetFolder 方法返回与指定的路径中某文件夹相应的 Folder 对象。 object.GetFolder(folderspec) 参数 object 必选项。应为 FileSy
VBS教程：方法-GetFileName 方法
GetFileName 方法返回指定路径（不是指定驱动器路径部分）的最后一个文件或文件夹。 object.GetFileName(pathspec) 参数 object 必选项。应为
VBS教程：方法-GetFile 方法
GetFile 方法返回与指定路径中某文件相应的 File 对象。 object.GetFile(filespec) 参数 object 必选项。应为 FileSystemObject
VBS教程：方法-GetExtensionName 方法
GetExtensionName 方法返回字符串，该字符串包含路径最后一个组成部分的扩展名。 object.GetExtensionName(path) 参数 object 必选项。应
VBS教程：方法-GetDriveName 方法
GetDriveName 方法返回包含指定路径中驱动器名的字符串。 object.GetDriveName(path) 参数 object 必选项。应为 FileSystemObjec
VBS教程：方法-GetDrive 方法
GetDrive 方法返回与指定的路径中驱动器相对应的 Drive 对象。 object.GetDrive drivespec 参数 object 必选项。应为 FileSystemO
VBS教程：方法-GetBaseName 方法
GetBaseName 方法返回字符串，其中包含文件的基本名 (不带扩展名), 或者提供的路径说明中的文件夹。 object.GetBaseName(path) 参数 object 必
VBS教程：方法-GetAbsolutePathName 方法
GetAbsolutePathName 方法从提供的指定路径中返回完整且含义明确的路径。 object.GetAbsolutePathName(pathspec) 参数 object
VBS教程：方法-FolderExists 方法
FolderExists 方法如果指定的文件夹存在，则返回 True；否则返回 False。 object.FolderExists(folderspec) 参数 object 必选项
VBS教程：方法-FileExists 方法
FileExists 方法如果指定的文件存在返回 True；否则返回 False。 object.FileExists(filespec) 参数 object 必选项。应为 FileS

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

sorting - 并行基数排序，这个实现实际上是如何工作的？有一些启发式方法吗？