python - 在 Windows 上用 Python 演示多核加速的一些示例代码是什么？-6ren

python - 在 Windows 上用 Python 演示多核加速的一些示例代码是什么？

转载作者：太空宇宙更新时间：2023-11-03 14:53:31

24

4

我在 Windows 上使用 Python 3 并尝试构建一个玩具示例，演示如何使用多个 CPU 内核来加速计算。玩具示例是 Mandelbrot 分形的渲染。

到目前为止:

我避免了线程，因为全局解释器锁在此上下文中禁止多核
我放弃了不能在 Windows 上运行的示例代码，因为它缺乏 Linux 的 fork 功能
尝试使用“多处理”包。我声明 p=Pool(8) (8 是我的核心数)并使用 p.starmap(..) 来委派工作。这应该会产生多个“子进程”，窗口将自动委托(delegate)给不同的 CPU

但是，我无法证明任何加速，无论是由于开销还是没有实际的多处理。因此，指向具有可证明加速的玩具示例的指针将非常有帮助:-)

编辑: 谢谢!这将我推向了正确的方向，我现在有了一个工作示例，演示了在具有 4 个内核的 CPU 上速度加倍。
我的代码副本和“讲义”在这里:https://pastebin.com/c9HZ2vAV

我决定使用 Pool()，但稍后会尝试 @16num 指出的“Process”替代方案。下面是 Pool() 的代码示例:

    p = Pool(cpu_count())

    #Unlike map, starmap only allows 1 input. "partial" provides a workaround
    partial_calculatePixel = partial(calculatePixel, dataarray=data) 
    koord = []
    for j in range(height):
        for k in range(width):
            koord.append((j,k))

    #Runs the calls to calculatePixel in a pool. "hmm" collects the output
    hmm = p.starmap(partial_calculatePixel,koord)

最佳答案

演示多处理加速非常简单:

import multiprocessing
import sys
import time

# multi-platform precision clock
get_timer = time.clock if sys.platform == "win32" else time.time

def cube_function(num):
    time.sleep(0.01)  # let's simulate it takes ~10ms for the CPU core to cube the number
    return num**3

if __name__ == "__main__":  # multiprocessing guard
    # we'll test multiprocessing with pools from one to the number of CPU cores on the system
    # it won't show significant improvements after that and it will soon start going
    # downhill due to the underlying OS thread context switches
    for workers in range(1, multiprocessing.cpu_count() + 1):
        pool = multiprocessing.Pool(processes=workers)
        # lets 'warm up' our pool so it doesn't affect our measurements
        pool.map(cube_function, range(multiprocessing.cpu_count()))
        # now to the business, we'll have 10000 numbers to quart via our expensive function
        print("Cubing 10000 numbers over {} processes:".format(workers))
        timer = get_timer()  # time measuring starts now
        results = pool.map(cube_function, range(10000))  # map our range to the cube_function
        timer = get_timer() - timer  # get our delta time as soon as it finishes
        print("\tTotal: {:.2f} seconds".format(timer))
        print("\tAvg. per process: {:.2f} seconds".format(timer / workers))
        pool.close()  # lets clear out our pool for the next run
        time.sleep(1)  # waiting for a second to make sure everything is cleaned up

当然，我们只是在这里模拟每个数字 10 毫秒的计算，您可以将 cube_function 替换为任何 CPU 负担的真实演示。结果符合预期:

Cubing 10000 numbers over 1 processes:
        Total: 100.01 seconds
        Avg. per process: 100.01 seconds
Cubing 10000 numbers over 2 processes:
        Total: 50.02 seconds
        Avg. per process: 25.01 seconds
Cubing 10000 numbers over 3 processes:
        Total: 33.36 seconds
        Avg. per process: 11.12 seconds
Cubing 10000 numbers over 4 processes:
        Total: 25.00 seconds
        Avg. per process: 6.25 seconds
Cubing 10000 numbers over 5 processes:
        Total: 20.00 seconds
        Avg. per process: 4.00 seconds
Cubing 10000 numbers over 6 processes:
        Total: 16.68 seconds
        Avg. per process: 2.78 seconds
Cubing 10000 numbers over 7 processes:
        Total: 14.32 seconds
        Avg. per process: 2.05 seconds
Cubing 10000 numbers over 8 processes:
        Total: 12.52 seconds
        Avg. per process: 1.57 seconds

现在，为什么不是 100% 线性？好吧，首先，将数据映射/分发到子进程并取回它需要一些时间，上下文切换有一些成本，还有其他任务不时使用我的 CPU，time.sleep() 并不十分精确(在非 RT 操作系统上也可能如此)...但结果大致在并行处理预期的范围内。

关于python - 在 Windows 上用 Python 演示多核加速的一些示例代码是什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44521931/

24

4

0

文章推荐： ssl - OpenLDAP SSL 证书

文章推荐： ssl - 启用/禁用 mod_ssl 后 Apache 无法启动

文章推荐： ruby-on-rails - rubyonrails 和 heroku 中的证书

twisted - 多核/多处理器上的TwistedWeb
人们在运行TwistedWeb服务器时使用哪些技术来利用多个处理器/内核？有推荐的方法吗？我基于twisted.web的Web服务在Amazon EC2实例上运行，该实例通常具有多个CPU核心(8、
r - 多核::应用？
multicore 库中是否有类似 sapply 的东西？还是我必须 unlist(mclapply(..)) 才能实现这一点？如果它不存在:推理是什么？提前致谢，如果这是一个愚蠢的问题，我们深表
python - 多核 ZeroMQ？
ZeroMQ 用于接收输入参数.. def server(): rep = context.socket(zmq.REP) rep.bind('tcp://*:{}'.format(P
algorithm - 多核 - 如何合并在每个核心上找到的本地数据组？
我有一大组标量值分布在 3D 网格上(每个顶点一个值。) 我的目标是展示: 网格中值大于阈值的所有点。并将连接的点分组(以简化显示。) 所以我的基本解决方案是: 找到通过阈值测试的点对于每个没有被
聊聊CPU的发展历程之单核、多核、超线程
作者：小牛呼噜噜 | https://xiaoniuhululu.com 计算机内功、JAVA底层、面试、职业成长相关资料等更多精彩文章在公众号「小牛呼噜噜」。大家好，我是呼噜噜，
multithreading - 多核 J -- 并行化
有没有办法让 J 使用多个核心？我认为 APL/J 的部分好处是语言结构很适合并行解决方案。查看我的 CPU 使用率(我在 OSX 上)，显然只有一个处理器在使用。我有一个很重的函数 f 作用在一
multiprocessing - 多核 CPU 中断
多核处理器如何处理中断。我知道单核处理器如何处理中断。我也知道不同类型的中断。我想知道多核处理器如何处理硬件、程序、cpu时序和输入/输出中断最佳答案这应该被视为 other answer
multicore - 多核 llvm 的垃圾收集器？
很长一段时间以来，我一直将 LLVM 视为我目前正在实现的语言的新后端。它似乎具有良好的性能，相当高级的生成 API，足够的低级支持来优化奇特的优化。此外，虽然我自己没有检查过，但苹果似乎已经成功地演
operating-system - 多核/多CPU机器中的中断如何工作？
我最近开始研究低级OS编程。我(非常缓慢)目前正在研究两本较旧的书，即XINU和Build Your Own 32 Bit OS，以及上一个问题How to get started in operat
multithreading - 多核 + 超线程 - 线程是如何分布的？
我正在阅读对新英特尔凌动 330 的评论，他们指出任务管理器显示 4 个内核 - 两个物理内核，另外还有两个由超线程模拟。假设您有一个包含两个线程的程序。还假设这些是在 PC 上执行任何工作的唯一线
c++ - 多核 C++ 线程
我不知道如何在 C++ 中进行线程化，我不仅不想知道，而且有没有一种方法可以将线程强制到不同的核心上？另外，我如何才能知道用户拥有多少个内核？最佳答案将线程绑定(bind)到任意 CPU 称为设置
linux - 多核 Linux 内核中的上下文切换
如果需要在多核处理器机器的多个内核上并行执行，Linux 内核是否会同时执行多个上下文切换？有引用吗？最佳答案是的，你是对的。在 SMP 系统上，多个上下文切换同时发生。每个核心都可以独立进行上下
Linux:多核 CPU 中的进程和线程
与进程相比，线程更不可能从多核处理器中获益，这是真的吗？换句话说，内核会决定在单核而不是多核上执行线程吗？我说的是属于同一进程的线程。最佳答案我不知道(各种)Linux 调度程序如何处理这个问题
Solr 单索引 vs Solr 多核
我需要一些帮助来决定在单个 Solr 实例中创建单个索引还是在单个 Solr 实例中创建多个核心，每个核心为一个索引提供服务。我的理解是，solr 中的单个索引通常用于索引一种类型的文档。当您有不同
performance - 多核/多处理器是否有助于 Web 服务器的性能？
NGINX 或 Apache 是否受益于具有以下任一项的服务器: 多核，或者多个处理器？如果是，为什么？最佳答案使用多个 CPU/CPU 内核使服务器应用程序有机会并行处理多个客户端连接(和请
x86 - 多核/NUMA 上的 CPUID
我正在为我的爱好操作系统开发 CPU 检测和一般环境检测代码。是否存在需要多次调用 CPUID 的情况？也就是说如果系统有多个核心，操作系统是否需要在每个核心上调用CPUID？ NUMA 也是如此。
performance - 多核 Intel CPU 中的高速缓存是如何共享的？
我有一些关于多核 CPU 或多处理器系统中使用的高速缓存存储器的问题。 (虽然与编程没有直接关系，但当一个人为多核处理器/多处理器系统编写软件时，它会产生很多影响，因此在此询问!) 在多处理器系统或多
c++ - 多核，使从函数内部调用的函数在第二个核上运行。直接秀，opencv
所以，我一直在使用 opencv 开发实时跟踪系统。几天前，我不得不开始使用 directshow(这对我来说是全新的)，因为我需要网络摄像头的更高分辨率。分辨率越高，CPU 使用率就越高。仅使用没有
java - 多核/并发编程和 .NET/Java
我经常听说其他语言被提升为更适合多核/并发编程，例如Clojure、Scala、Erlang 等，但我有点困惑为什么我需要担心多核问题，Java/.NET VM 不应该自动处理吗？如果没有，背后的原因
multithreading - 多核 CPU 上能否真正同时执行 2 条指令
假设 x86 多核 PC 架构... 假设有 2 个内核(能够执行 2 个单独的指令流)，并且 CPU 和 RAM 之间的接口(interface)是内存总线。调度在 2 个不同内核上的 2 条指令

首页

博学

6Ren·AI

商城

python - 在 Windows 上用 Python 演示多核加速的一些示例代码是什么？