Why LLVM IR generated by numba for vector addition is too complex(Numba生成的用于向量加法的LLVM IR为何太复杂)-6ren

Why LLVM IR generated by numba for vector addition is too complex(Numba生成的用于向量加法的LLVM IR为何太复杂)

转载作者：bug小助手更新时间：2023-10-28 21:23:48

26

4

I wanted to check the LLVM IR for a vector addition from numba and noticed it generates a lot of IR just for a simple add. I was hoping a simple "add" IR but it generates 2000 lines of LLVM IR. Is there a way to get a minimal code?

我想检查来自Numba的向量加法的LLVM IR，注意到它只为一个简单的加法生成了大量的IR。我希望有一个简单的“添加”IR，但它会生成2000行LLVM IR。有没有办法获得最低限度的代码？

from numba import jit
import numpy as np

@jit(nopython=True,nogil=True)
def mysum(a,b):
    return a+b

a, b = 1.3 * np.ones(5), 2.2 * np.ones(5)
mysum(a, b)

# Get the llvm IR
llvm_ir =list(mysum.inspect_llvm().values())[0]
print(llvm_ir)
with open("llvm_ir.ll", "w") as file:
    file.write(llvm_ir)

# Get the assembly code
asm = list(mysum.inspect_asm().values())[0]
print(asm)

with open("llvm_ir.asm", "w") as file:
    file.write(asm)

更多回答

Can you please share the IR? I imagine it has to do with handling different types of objects, as it cannot guarantee the inputs are integers.

你能分享一下红外线吗？我想它与处理不同类型的对象有关，因为它不能保证输入是整数。

优秀答案推荐

Numba generates 3 functions. The first one does the actual computation. The second one is a wrapping function meant to be called from CPython. It converts CPython dynamic objects to native types for the input values and does the opposite operation for returned values. The last function is meant to be called from other Numba functions (if any).

Numba生成3个函数。第一个做实际的计算。第二个是一个包装函数，用于从CPython调用。它将CPython动态对象转换为输入值的本机类型，并对返回值执行相反的操作。最后一个函数意味着从其他Numba函数（如果有的话）调用。

Converting Numpy arrays is not a trivial task (Numpy arrays are dynamic object containing a bunch of information like a memory buffer, the number of dimensions, the stride+size along each dimension, the dynamic Numpy type, etc.). This is why the code is significantly bigger with Numpy arrays than with simpler data-types like floating-point values. Indeed, the whole LLVM IR code is 20 times smaller in this case and this wrapping function is very simple.

转换Numpy数组不是一项简单的任务(Numpy数组是动态对象，包含一系列信息，如内存缓冲区、维度数量、沿每个维度的步长+大小、动态Numpy类型等)。这就是为什么使用Numpy数组的代码比使用浮点值等更简单的数据类型的代码要大得多。事实上，在这种情况下，整个LLVM IR代码要小20倍，并且这个包装函数非常简单。

Still, the main issue is not much the wrapping function, but the first one doing the actual computation (75% of the LLVM IR code). One reason is that a + b create a new temporary Numpy array that should be initialized and filled using an implicit loop. This implicit operation generates more code than if the code is done manually. This is certainly because Numba needs to case about many possible cases that may never happens in practice. For example, the LLVM IR of the following Numba function is twice smaller:

尽管如此，主要问题不是包装函数，而是第一个进行实际计算的函数(75%的LLVMIR代码)。原因之一是a+b创建了一个新的临时Numpy数组，应该使用隐式循环对其进行初始化和填充。与手动完成代码相比，这种隐式操作会生成更多代码。这当然是因为Numba需要处理许多可能的案件，而这些案件在实践中可能永远不会发生。例如，以下Numba函数的LLVM IR要小两倍：

@jit('float64[::1](float64[::1], float64[::1])', nopython=True,nogil=True)
def mysum(a,b):
    out = np.empty(a.size, dtype=np.float64)
    for i in range(a.size):
        out[i] = a[i] + b[i]
    return out

If we remove the loop, then it is again twice smaller. This shows that the Numpy array creation/initialization take a significant fraction of the code space. The loop also takes a significant space because Numba need to support the wrap-around feature supported by Numpy arrays, and also because Numpy arrays does not have a typed data buffer. In C, arrays and pointers are much simpler and there is no wrap-around.

如果我们去掉环路，那么它又小了两倍。这表明Numpy数组的创建/初始化占用了很大一部分代码空间。循环还会占用大量空间，因为Numba需要支持Numpy数组支持的回绕特性，还因为Numpy数组没有类型化的数据缓冲区。在C中，数组和指针要简单得多，而且没有回绕。

The generation of pretty huge IR/ASM code is pretty common in high-level languages. The code is often big due to advanced features, poor code-size optimizations. Reducing the size of the generated code is a significant work and it sometimes conflits with performance. Indeed, to get high-performance codes, compilers often need unroll loops, split the code in different variants to mitigate the cost of higher level features (eg. pointer aliasing, vectorization, removal of wrap-around) resulting in significantly bigger IR/ASM codes.

在高级语言中，生成相当大的IR/ASM代码是很常见的。由于高级功能、糟糕的代码大小优化，代码往往很大。减少生成代码的大小是一项重要的工作，有时它与性能混为一谈。事实上，为了获得高性能的代码，编译器通常需要展开循环，将代码拆分到不同的变体中，以降低更高级别功能的成本(例如。指针别名、矢量化、去除回绕)导致显著更大的IR/ASM代码。

更多回答

26

4

0

文章推荐： docker - 在 Docker 中将 "named volume"挂载为非 root

文章推荐： docker - Pycharm Docker 端口绑定(bind)

tomcat - 无法使用 JDK6 生成 keystore ，但可以使用 JDK5 生成
我正在尝试使用以下 keytool 命令为我的应用程序生成 keystore : keytool -genkey -alias tomcat -keystore tomcat.keystore -ke
javascript - D3.js:生成 X 轴会删除我的一些点的标签，生成 Y 轴会将它们全部删除
编辑:在西里尔正确解决问题后，我注意到只需将生成轴的函数放在用于生成标签的函数下面就可以解决问题。我几乎读完了 O'Reilly 书中关于 D3.js 的教程，并在倒数第二页上制作了散点图，但是当添
graphql - 从 Schema 生成 GraphQL GUI 和从 GUI 生成 Schema
虽然使用 GraphiQL 效果很好，但我的老板要求我实现一个用户界面，用户可以在其中通过 UI 元素(例如复选框、映射关系)检查呈现给他们的元素并获取数据，这样做将为该人生成 graphql 输入，
java - 如何删除 Netbean 6.8 中的生成源(jax-ws)？我应该根据网站地址从 WSDL 生成 WS 客户端还是从 api 生成 WS 客户端？
我尝试在 Netbean 6.8 中使用 ws-import 生成 Java 类。我想重新生成 jax-ws，因为在 ebay.api.paypalapi 包中发现了一个错误(我认为该错误是由于 Pa
生成 Perl 日期时间？
我有一个 perl 脚本，它获取系统日期并将该日期写入文件名。系统日期被分配给 TRH1 变量，然后它被设置为一个文件名。 $TRH1 =`date + %Y%m%d%H%M`; print "TR
Haskell UUID 生成
我是 Haskell 的新手，需要帮助。我正在尝试构建一种必须具有某种唯一性的新数据类型，因此我决定使用 UUID 作为唯一标识符: data MyType = MyType { uuid ::
php - 生成 XML
我制作了一个脚本，它可以根据 Mysql 数据库中的一些表生成 XML。该脚本在 PHP 中运行。 public function getRawMaterials($apiKey, $format
openssl - 生成、签署和验证数字签名
所以这是我的项目中的一个问题。 In this task, we will use OpenSSL to generate digital signatures. Please prepare a f
r - 生成/绘制对数正态生存函数
我在 SAS LIFEREG 中有一个加速故障时间模型，我想绘制它。因为 SAS 在绘图方面非常糟糕，我想实际重新生成 R 中曲线的数据并将它们绘制在那里。 SAS 提出了一个尺度(在指数分布固定为
Django key 生成
我正在为 Django 后端制作一个样板，并且我需要能够使它到达下一个下载它的人显然无法访问我的 secret key 的地方，或者拥有不同的 key 。我一直在研究一些选项，并在这个过程中进行了实验
iPhone Excel 生成
我正在创建一个生成采购订单的应用程序。我可以根据用户输入的详细信息创建文本文件。我想生成一个看起来比普通文本文件好得多的 Excel。有没有可以在我的应用程序中使用的开源库？最佳答案目前还没有任何
ScalaCheck 生成 BST
我正在尝试使用 ScalaCheck 为 BST 创建一个 Gen，但是当我调用 .sample 方法时，它给了我 java.lang.NullPointerException。我哪里错了？ seal
Javascript 测验结果计算/生成
已关闭。此问题需要 debugging details 。目前不接受答案。编辑问题以包含 desired behavior, a specific problem or error, and the
verilog 生成 if/else
我尝试编写一些代码，例如(在verilog中): parameter N = 128; if (encoder_in[0] == 1) begin 23 binary_out = 1;
hibernate - Grails 生成
我正忙于在 Grails 项目中进行从 MySQL 到 Postgres 的相当复杂的数据迁移。我正在使用 GORM 在 PostGres 中生成模式，然后执行 MySQL -> mysqldump
XSLT 生成 UUID
如何使用纯 XSLT 生成 UUID？基本上是寻找一种使用 XSLT 创建独特序列的方法。该序列可以是任意长度。我正在使用 XSLT 2.0。最佳答案这是一个good example 。基本上，
ios - 生成.app文件并安装在设备上
我尝试安装.app文件，但是当我安装并单击“同步”(在iTunes中)时，我开始在设备上开始安装，然后停止，这是一个问题，我不知道在哪里，但我看到了我无法解决的奇怪的事情: 最佳答案似乎您没有在Xc
java - 生成 JavaDocs？
自从我生成 JavaDocs 以来已经有一段时间了，我确信这些选项在过去 10 年左右的时间里已经得到了改进。我能否得到一些有关生成器的建议，该生成器将输出类似于 .Net 文档结构的 JavaDo
.net - 生成 PDF
我想学习如何生成 PDF，我不想使用任何第三方工具，我想自己用代码创建它。到目前为止，我所看到的唯一示例是我通过在第 3 方 dll 上打开反射器查看的代码，以查看发生了什么。不幸的是，到目前为止我看
C# Excel 生成
我正在从 Epplus 库生成 excel 条形图。这是我成功生成的。我的 table 是这样的 Mumbai Delhi Financial D

首页

博学

6Ren·AI

商城

Why LLVM IR generated by numba for vector addition is too complex(Numba生成的用于向量加法的LLVM IR为何太复杂)