java - Java中Eratosthenes的多线程分段筛分-6ren

java - Java中Eratosthenes的多线程分段筛分

转载作者：行者123 更新时间：2023-12-01 21:49:09

我正在尝试用 Java 创建一个快速的素数生成器。人们(或多或少)接受了最快的方法是 Eratosthenes 的分段筛:https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes .可以进一步实现许多优化以使其更快。 截至目前，我的实现生成 50847534下面的素数10^9在关于 1.6秒 ，但我希望让它更快，至少打破 1第二个障碍。为了增加获得好的回复的机会，我将包括算法和代码的演练。

尽管如此，作为 TL;DR ，我希望在代码中包含多线程

出于这个问题的目的，我想将 Eratosthenes 的“分段”筛和“传统”筛区分开来。传统筛子需要O(n)空间，因此输入范围非常有限(它的限制)。然而，分段筛只需要 O(n^0.5)空间，并且可以在更大的范围内运行。 (一个主要的加速是使用缓存友好的分段，考虑到特定计算机的 L1 & L2 缓存大小)。最后，与我的问题有关的主要区别是传统的筛选是顺序的，这意味着它只能在前面的步骤完成后继续。然而，分段筛不是。每个段都是独立的，并根据筛选素数(不大于 n^0.5 的素数)单独“处理”。这意味着理论上，一旦我有了筛选素数，我就可以在多台计算机之间分配工作，每台计算机处理不同的部分。彼此的工作相互独立。假设(错误地)每个段需要相同的时间 t完成，还有k段，一台计算机需要的总时间为 T = k * t , 而 k计算机，每台计算机在不同的段上工作将需要总时间 T = t来完成整个过程。 (实际上，这是错误的，但为了示例的简单起见)。

这让我开始阅读有关多线程的内容 - 将工作划分为几个线程，每个线程处理少量工作以更好地使用 CPU。据我所知，传统的筛子不能完全是多线程的，因为它是顺序的。每个线程都依赖于前一个线程，从而使整个想法变得不可行。但是分段筛可能确实(我认为)是多线程的。

与其直接进入我的问题，我认为首先介绍我的代码很重要，所以我在此包括我目前最快的分段筛实现。我为此付出了很大的努力。花了相当长的时间，慢慢地调整和添加优化。代码并不简单。我会说，这相当复杂。因此，我假设读者熟悉我介绍的概念，例如轮分解、素数、分段等。我已经包含了注释，以便更容易理解。

import java.math.BigInteger;
import java.util.ArrayList;
import java.util.Arrays;

public class primeGen {

    public static long x = (long)Math.pow(10, 9); //limit
    public static int sqrtx;
    public static boolean [] sievingPrimes; //the sieving primes, <= sqrtx

    public static int [] wheels = new int [] {2,3,5,7,11,13,17,19}; // base wheel primes
    public static int [] gaps; //the gaps, according to the wheel. will enable skipping multiples of the wheel primes
    public static int nextp; // the first prime > wheel primes
    public static int l; // the amount of gaps in the wheel

    public static void main(String[] args)
    {
        long startTime = System.currentTimeMillis();

        preCalc();  // creating the sieving primes and calculating the list of gaps

        int segSize = Math.max(sqrtx, 32768*8); //size of each segment
        long u = nextp; // 'u' is the running index of the program. will continue from one segment to the next
        int wh = 0; // the will be the gap index, indicating by how much we increment 'u' each time, skipping the multiples of the wheel primes

        long pi = pisqrtx(); // the primes count. initialize with the number of primes <= sqrtx

        for (long low = 0 ; low < x ; low += segSize) //the heart of the code. enumerating the primes through segmentation. enumeration will begin at p > sqrtx
        {
            long high = Math.min(x, low + segSize);
            boolean [] segment = new boolean [(int) (high - low + 1)];

            int g = -1;
            for (int i = nextp ; i <= sqrtx ; i += gaps[g])
            { 
                if (sievingPrimes[(i + 1) / 2])
                {
                    long firstMultiple = (long) (low / i * i);
                    if (firstMultiple < low) 
                        firstMultiple += i; 
                    if (firstMultiple % 2 == 0) //start with the first odd multiple of the current prime in the segment
                        firstMultiple += i;

                    for (long j = firstMultiple ; j < high ; j += i * 2) 
                        segment[(int) (j - low)] = true; 
                }
                g++;
                //if (g == l) //due to segment size, the full list of gaps is never used **within just one segment** , and therefore this check is redundant. 
                              //should be used with bigger segment sizes or smaller lists of gaps
                    //g = 0;
            }

            while (u <= high)
            {
                if (!segment[(int) (u - low)])
                    pi++;
                u += gaps[wh];
                wh++;
                if (wh == l)
                    wh = 0;
            }
        }

        System.out.println(pi);

        long endTime = System.currentTimeMillis();
        System.out.println("Solution took "+(endTime - startTime) + " ms");
    }

    public static boolean [] simpleSieve (int l)
    {
        long sqrtl = (long)Math.sqrt(l);
        boolean [] primes = new boolean [l/2+2];
        Arrays.fill(primes, true);
        int g = -1;
        for (int i = nextp ; i <= sqrtl ; i += gaps[g])
        {
            if (primes[(i + 1) / 2])
                for (int j = i * i ; j <= l ; j += i * 2)
                    primes[(j + 1) / 2]=false;
            g++;
            if (g == l)
                g=0;
        }
        return primes;
    }

    public static long pisqrtx ()
    {
        int pi = wheels.length;
        if (x < wheels[wheels.length-1])
        {
            if (x < 2)
                return 0;
            int k = 0;
            while (wheels[k] <= x)
                k++;
            return k;
        }
        int g = -1;
        for (int i = nextp ; i <= sqrtx ; i += gaps[g])
        {
            if(sievingPrimes[( i + 1 ) / 2])
                pi++;
            g++;
            if (g == l)
                g=0;
        }

        return pi;
    }

    public static void preCalc ()
    {
        sqrtx = (int) Math.sqrt(x);

        int prod = 1;
        for (long p : wheels)
            prod *= p; // primorial
        nextp = BigInteger.valueOf(wheels[wheels.length-1]).nextProbablePrime().intValue(); //the first prime that comes after the wheel
        int lim = prod + nextp; // circumference of the wheel

        boolean [] marks = new boolean [lim + 1];
        Arrays.fill(marks, true);

        for (int j = 2 * 2 ;j <= lim ; j += 2)
            marks[j] = false;
        for (int i = 1 ; i < wheels.length ; i++)
        {
            int p = wheels[i];
            for (int j = p * p ; j <= lim ; j += 2 * p)
                marks[j]=false;   // removing all integers that are NOT comprime with the base wheel primes
        }
        ArrayList <Integer> gs = new ArrayList <Integer>(); //list of the gaps between the integers that are coprime with the base wheel primes
        int d = nextp;
        for (int p = d + 2 ; p < marks.length ; p += 2)
        {
            if (marks[p]) //d is prime. if p is also prime, then a gap is identified, and is noted.
            {
                gs.add(p - d);
                d = p;
            }
        }
        gaps = new int [gs.size()];
        for (int i = 0 ; i < gs.size() ; i++)
            gaps[i] = gs.get(i); // Arrays are faster than lists, so moving the list of gaps to an array
        l = gaps.length;

        sievingPrimes = simpleSieve(sqrtx); //initializing the sieving primes
    }

}

目前，它生产50847534下面的素数10^9在关于 1.6秒。 这非常令人印象深刻，至少以我的标准来看，但我希望让它更快，可能会打破 1第二个障碍。即便如此，我相信它仍然可以做得更快。

整个程序基于 车轮分解 : https://en.wikipedia.org/wiki/Wheel_factorization .我注意到我使用所有质数的轮子得到最快的结果 19 .

public static int [] wheels = new int [] {2,3,5,7,11,13,17,19}; // base wheel primes

这意味着跳过这些素数的倍数，导致搜索范围小得多。然后我们需要在 preCalc 中计算我们需要采取的数字之间的差距。方法。如果我们在搜索范围内的数字之间进行这些跳跃，我们将跳过基素数的倍数。

public static void preCalc ()
    {
        sqrtx = (int) Math.sqrt(x);

        int prod = 1;
        for (long p : wheels)
            prod *= p; // primorial
        nextp = BigInteger.valueOf(wheels[wheels.length-1]).nextProbablePrime().intValue(); //the first prime that comes after the wheel
        int lim = prod + nextp; // circumference of the wheel

        boolean [] marks = new boolean [lim + 1];
        Arrays.fill(marks, true);

        for (int j = 2 * 2 ;j <= lim ; j += 2)
            marks[j] = false;
        for (int i = 1 ; i < wheels.length ; i++)
        {
            int p = wheels[i];
            for (int j = p * p ; j <= lim ; j += 2 * p)
                marks[j]=false;   // removing all integers that are NOT comprime with the base wheel primes
        }
        ArrayList <Integer> gs = new ArrayList <Integer>(); //list of the gaps between the integers that are coprime with the base wheel primes
        int d = nextp;
        for (int p = d + 2 ; p < marks.length ; p += 2)
        {
            if (marks[p]) //d is prime. if p is also prime, then a gap is identified, and is noted.
            {
                gs.add(p - d);
                d = p;
            }
        }
        gaps = new int [gs.size()];
        for (int i = 0 ; i < gs.size() ; i++)
            gaps[i] = gs.get(i); // Arrays are faster than lists, so moving the list of gaps to an array
        l = gaps.length;

        sievingPrimes = simpleSieve(sqrtx); //initializing the sieving primes
    }

末 preCalc方法， simpleSieve方法被调用，有效地筛选出前面提到的所有筛选素数，素数 <= sqrtx .这是一个简单的 Eratosthenes 筛子，而不是分段的，但它仍然基于 车轮分解 , 以前计算过。

 public static boolean [] simpleSieve (int l)
    {
        long sqrtl = (long)Math.sqrt(l);
        boolean [] primes = new boolean [l/2+2];
        Arrays.fill(primes, true);
        int g = -1;
        for (int i = nextp ; i <= sqrtl ; i += gaps[g])
        {
            if (primes[(i + 1) / 2])
                for (int j = i * i ; j <= l ; j += i * 2)
                    primes[(j + 1) / 2]=false;
            g++;
            if (g == l)
                g=0;
        }
        return primes;
    }

最后，我们到达了算法的核心。我们首先枚举所有素数 <= sqrtx ，具有以下调用:

 long pi = pisqrtx();`

其中使用了以下方法:

public static long pisqrtx ()
    {
        int pi = wheels.length;
        if (x < wheels[wheels.length-1])
        {
            if (x < 2)
                return 0;
            int k = 0;
            while (wheels[k] <= x)
                k++;
            return k;
        }
        int g = -1;
        for (int i = nextp ; i <= sqrtx ; i += gaps[g])
        {
            if(sievingPrimes[( i + 1 ) / 2])
                pi++;
            g++;
            if (g == l)
                g=0;
        }

        return pi;
    }

然后，在初始化 pi 之后跟踪素数枚举的变量，我们执行上述分段，从第一个素数 > sqrtx开始枚举。 :

 int segSize = Math.max(sqrtx, 32768*8); //size of each segment
        long u = nextp; // 'u' is the running index of the program. will continue from one segment to the next
        int wh = 0; // the will be the gap index, indicating by how much we increment 'u' each time, skipping the multiples of the wheel primes

        long pi = pisqrtx(); // the primes count. initialize with the number of primes <= sqrtx

        for (long low = 0 ; low < x ; low += segSize) //the heart of the code. enumerating the primes through segmentation. enumeration will begin at p > sqrtx
        {
            long high = Math.min(x, low + segSize);
            boolean [] segment = new boolean [(int) (high - low + 1)];

            int g = -1;
            for (int i = nextp ; i <= sqrtx ; i += gaps[g])
            { 
                if (sievingPrimes[(i + 1) / 2])
                {
                    long firstMultiple = (long) (low / i * i);
                    if (firstMultiple < low) 
                        firstMultiple += i; 
                    if (firstMultiple % 2 == 0) //start with the first odd multiple of the current prime in the segment
                        firstMultiple += i;

                    for (long j = firstMultiple ; j < high ; j += i * 2) 
                        segment[(int) (j - low)] = true; 
                }
                g++;
                //if (g == l) //due to segment size, the full list of gaps is never used **within just one segment** , and therefore this check is redundant. 
                              //should be used with bigger segment sizes or smaller lists of gaps
                    //g = 0;
            }

            while (u <= high)
            {
                if (!segment[(int) (u - low)])
                    pi++;
                u += gaps[wh];
                wh++;
                if (wh == l)
                    wh = 0;
            }
        }

我也将其作为注释包含在内，但也会进行解释。因为段大小相对较小，我们不会在一个段内遍历整个间隙列表，检查它 - 是多余的。 (假设我们使用 19-wheel )。但是在更广泛的程序概述中，我们将使用整个间隙数组，因此变量 u必须遵循它，而不是意外超越它:

 while (u <= high)
            {
                if (!segment[(int) (u - low)])
                    pi++;
                u += gaps[wh];
                wh++;
                if (wh == l)
                    wh = 0;
            }

使用更高的限制最终会呈现更大的段，这可能导致需要检查我们即使在段内也没有超过间隙列表。这个，或者调整 wheel素数基数可能对程序产生这种影响。不过，切换到位筛选可以在很大程度上改善段限制。

作为一个重要的旁注，我知道有效的分割是
一个需要 L1 & L2缓存大小考虑在内。我得到
使用 32,768 * 8 = 262,144 = 2^18 段大小的最快结果.我不确定我的计算机的缓存大小是多少，但我知道
认为它不会那么大，因为我看到大多数缓存大小 <= 32,768 .
尽管如此，这会在我的计算机上产生最快的运行时间，所以这是
为什么它是所选的段大小。

正如我所提到的，我仍然希望对此进行很多改进。一世
相信，根据我的介绍，多线程会导致
加速因子为 4 ，使用4个线程(对应4个
核)。这个想法是每个线程仍然会使用
分段筛，但工作在不同的 portions .划分n进入 4相等的部分 - 线程，每个线程依次执行
关于n/4的分段它负责的元素，使用
以上程序。我的问题是我该怎么做？阅读有关
不幸的是，多线程和示例并没有给我带来任何
关于如何在上述案例中有效实现它的见解。它
在我看来，与其背后的逻辑相反，线程是
顺序运行，而不是同时运行。这就是为什么我
将其从代码中排除以使其更具可读性。我真的会
欣赏 代码示例 关于如何在此特定代码中执行此操作，但是
很好的解释和引用也许也会这样做。

此外，我想听听更多加速的方法
这个程序甚至更多，您有任何想法，我很想听听!
真的想让它变得非常快速和高效。谢谢!

最佳答案

安 example like this应该可以帮助您入门。

解决方案概述:

定义包含特定段的数据结构(“任务”)；您也可以将所有不可变的共享数据放入其中以获得额外的整洁。如果你足够小心，你可以将一个通用的可变数组连同段限制一起传递给所有任务，并且只更新这些限制内的数组部分。这更容易出错，但可以简化加入结果的步骤 (AFAICT; YMMV)。

定义存储任务计算结果的数据结构(“结果”)。即使您只是更新一个共享的结果结构，您也可能需要表明该结构的哪一部分到目前为止已被更新。

创建一个接受任务、运行计算并将结果放入给定结果队列的 Runnable。

为任务创建一个阻塞输入队列，为结果创建一个队列。

创建一个线程数接近机器核心数的ThreadPoolExecutor。

将所有任务提交给线程池执行程序。它们将被安排在池中的线程上运行，并将它们的结果放入输出队列，不一定按顺序排列。

等待线程池中的所有任务完成。

排空输出队列并将部分结果加入最终结果。

通过将结果加入到读取输出队列的单独任务中，或者甚至通过更新 synchronized 下的可变共享输出结构，可能(也可能不会)实现额外的加速。，取决于加入步骤涉及多少工作。

希望这可以帮助。

关于java - Java中Eratosthenes的多线程分段筛分，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57203706/

文章推荐： wpf - 使用 WPF PRISM 4 正常关闭应用程序

文章推荐： php - Laravel 5 Auth 后如何重定向到单个 URI？

文章推荐：从汇编代码中调用 fgets

文章推荐： Java软件工程: Composition vs Aggregation

haskell - Sundaram 筛 - 列表理解
我正在尝试编写一个函数，使用 "Sieve of Sundaram" algorithm 从 1..n 计算所有奇数素数. 这是我的尝试: sSund :: Integer -> [Integer]
list - Haskell 中优化的 Eratosthenes 筛
我是 Haskell 的新手，对于我正在实现的事情，我需要一个素数列表。我试着写一个，但它太慢了。这是我尝试过的。 primeList = primes 1000 primes :: Int ->

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - Java中Eratosthenes的多线程分段筛分