python - TensorFlow matmul : Blas xGEMMBatched launch failed-6ren

python - TensorFlow matmul : Blas xGEMMBatched launch failed

转载作者：行者123 更新时间：2023-12-01 09:19:37

当我尝试在 GPU 上使用 tf.matmul 函数时，出现以下错误:

InternalError: Blas xGEMMBatched launch failed

如果函数 calc() 中的 N 值设置为小于 15 的值，则该函数有效。

我正在运行tensorflow 1.8.0和Cuda V9.1.85。只有一个 Python 进程在 GPU 上工作，并且没有其他打开的 session 。另外，我还有足够的 GPU 内存可供使用 ( see attached image )。

更改 CUDA_VISIBLE_DEVICES 值不会显示任何效果。更改 ConfigProto() 设置也没有帮助。此外，使用 tf.matmul 并不能解决问题。

这是我正在运行的代码:

import tensorflow as tf
import numpy as np

import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

config = tf.ConfigProto()
config.gpu_options.allow_growth=True
tf.Session(config=config).close()

def calc():
    N = 15 # works for N <= 14
    a = 16
    b = 8
    X = np.random.rand(N, 11520, b, 1).astype(np.float32)
    print(X.nbytes*1e-6, "MB")
    W = np.random.rand(N, 11520, a, b).astype(np.float32)
    print(W.nbytes*1e-6, "MB")
    X_ = tf.constant(X, name="X-constant", dtype=tf.float32)
    W_ = tf.constant(W, name="W-constant", dtype=tf.float32)

    # return tf.matmul(W_, X_, name="mymatmul")
    return W_ @ X_

tf.reset_default_graph()
a = calc()
sess = tf.Session()
sess.run(tf.global_variables_initializer())
b = sess.run(a)
sess.close()
print(b.shape)

这是我得到的输出:

5.529599999999999 MB
88.47359999999999 MB

---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1321     try:
-> 1322       return fn(*args)
   1323     except errors.OpError as e:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1306       return self._call_tf_sessionrun(
-> 1307           options, feed_dict, fetch_list, target_list, run_metadata)
   1308 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1408           self._session, options, feed_dict, fetch_list, target_list,
-> 1409           run_metadata)
   1410     else:

InternalError: Blas xGEMMBatched launch failed : a.shape=[172800,16,8], b.shape=[172800,8,1], m=16, n=1, k=8, batch_size=172800
     [[Node: matmul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](W-constant, X-constant)]]

During handling of the above exception, another exception occurred:

InternalError                             Traceback (most recent call last)
<ipython-input-5-013153235a1a> in <module>()
      3 sess = tf.Session()
      4 sess.run(tf.global_variables_initializer())
----> 5 b = sess.run(a)
      6 sess.close()
      7 print(b.shape)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    898     try:
    899       result = self._run(None, fetches, feed_dict, options_ptr,
--> 900                          run_metadata_ptr)
    901       if run_metadata:
    902         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1133     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1134       results = self._do_run(handle, final_targets, final_fetches,
-> 1135                              feed_dict_tensor, options, run_metadata)
   1136     else:
   1137       results = []

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1314     if handle is None:
   1315       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1316                            run_metadata)
   1317     else:
   1318       return self._do_call(_prun_fn, handle, feeds, fetches)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1333         except KeyError:
   1334           pass
-> 1335       raise type(e)(node_def, op, message)
   1336 
   1337   def _extend_graph(self):

InternalError: Blas xGEMMBatched launch failed : a.shape=[172800,16,8], b.shape=[172800,8,1], m=16, n=1, k=8, batch_size=172800
     [[Node: matmul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](W-constant, X-constant)]]

Caused by op 'matmul', defined at:
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelapp.py", line 486, in start
    self.io_loop.start()
  File "/usr/local/lib/python3.6/dist-packages/tornado/platform/asyncio.py", line 127, in start
    self.asyncio_loop.run_forever()
  File "/usr/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
    self._run_once()
  File "/usr/lib/python3.6/asyncio/base_events.py", line 1432, in _run_once
    handle._run()
  File "/usr/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/usr/local/lib/python3.6/dist-packages/tornado/platform/asyncio.py", line 117, in _handle_events
    handler_func(fileobj, events)
  File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2662, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2785, in _run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2903, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-013153235a1a>", line 2, in <module>
    a = calc()
  File "<ipython-input-4-bf0e6012e9e2>", line 13, in calc
    return W_ @ X_
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py", line 847, in binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py", line 1976, in matmul
    a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1236, in batch_mat_mul
    "BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1740, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[172800,16,8], b.shape=[172800,8,1], m=16, n=1, k=8, batch_size=172800
     [[Node: matmul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](W-constant, X-constant)]]

最佳答案

以下是用 tf.einsum 替换 tf.matmul 的解决方法。但是，您的代码可以在我的配备 NVIDIA 840M (2004 MiB RAM)、cudnn 7.0.5.15 和 cuda 9.0.176 的机器上运行(也许降级有帮助？)。

import tensorflow as tf
import numpy as np

sess = tf.Session()

N = 20
M = 11520
a = 16
b = 8

W = np.random.rand(N, M, a, b).astype(np.float32)
X = np.random.rand(N, M, b, 1).astype(np.float32)

# tf.einsum does not support numpy arrays, so wrap W and X in tf.constants
W2 = tf.constant(W)
X2 = tf.constant(X)

# tf.einsum does not support "..." as seen later in np.einsum
WX = tf.einsum("uvik,uvkj->uvij", W2, X2)
# same as:
#WX = tf.matmul(W2, X2)

# calculate W@X using tf.einsum
result1 = sess.run(WX)
# calculate W@X using np.einsum
result2 = np.einsum("...ik,...kj->...ij", W, X)
# calculate W@X by hand (just for illustrative purpose, too slow for practical use)
result3 = np.zeros((N, M, a, 1), dtype=np.float32)
for i in range(a):
    for j in range(1):
        for k in range(b):
            result3[..., i, j] += W[..., i, k] * X[..., k, j]

# ensure that everything is correct
assert(np.allclose(result1, result2))
assert(np.allclose(result1, result3))

print("everything ok")

sess.close()

关于python - TensorFlow matmul : Blas xGEMMBatched launch failed，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50911052/

文章推荐： python - NaN 设置为 -1 时缩放功能的影响

文章推荐： julia - 用牛虻为单条线着色

文章推荐： amazon-s3 - s3cmd:根据扩展名搜索文件并从存储桶中删除

文章推荐： scala - LocalDate.now().plusMonths(m) 的隐式数值扩展

python - 为什么 tf.matmul(a,b, transpose_b=True) 有效，但 tf.matmul(a, tf.transpose(b)) 无效？
代码: x = tf.constant([1.,2.,3.], shape = (3,2,4)) y = tf.constant([1.,2.,3.], shape = (3,21,4)) tf.ma
python - 无效参数错误 : cannot compute MatMul as input #0(zero-based) was expected to be a float tensor but is a double tensor [Op:MatMul]
有人可以解释一下，TensorFlow 的 eager 模式是如何工作的吗？我正在尝试构建一个简单的回归，如下所示: import tensorflow as tf tfe = tf.contrib.
python - 值错误 : Shape must be rank 2 but is rank 1 for 'MatMul' (op: 'MatMul' ) with input shapes: [2], [2,3]
我对 Tensorflow 很陌生。我已经在搜索相同的问题，但我不明白。有代码。希望你能帮助我。代码: import tensorflow as tf w1 = tf.Variable(tf.ran
python - matmul 和通常的张量乘法有区别吗
我对使用 * 和 matmul 的两个张量之间的乘法感到困惑。下面是我的代码 import torch torch.manual_seed(7) features = torch.randn((2,
python - 不同等级的 Matmul
我有 3 个张量 X 形状(1, c, h, w)，假设(1, 20, 40, 50) Fx 形状(num, w, N)，假设(1000, 50, 10) Fy shape (num, N, h)，假
不同乘法大小的 Fortran matmul 函数的次数
我已经计算了 Fortran 的 MATMUL 函数使用不同乘法大小(32 × 32、64 × 64，...)花费的时间，我对结果有疑问。这些是结果: SIZE ----- TIME IN SECO
numpy - 理解一维向量上的 np.matmul
a = [1, 2, 3] b = [10, 10, 10] np.matmul(a, b) 结果是 60。 numpy 如何乘以 (3,) 和 (3,) 维度并返回点积而不是外积(3 * 3)或抛出
machine-learning - Matmul 输入和权重矩阵顺序？
我看到许多机器学习教程通过构造两个矩阵、权重矩阵和输入(或激活)矩阵来解释全连接网络，并执行矩阵到矩阵乘法(matmul)以形成线性方程。我看到的所有示例都将输入作为 matmul 的第一个参数，将
python - Numpy matmul - 当前不支持对象数组
当我在代码中的某行调用 np.matmul 时出现此错误。这是我在解决 python 调试器错误时得到的信息: > /home/marcos/Desktop/Machine_Learning_for_
python - 广播具有动态形状的 tf.matmul
我想在等级 2 和等级 3 的两个张量之间广播 tf.matmul 运算，其中一个包含“未知”形状的维度(基本上是特定维度中的“无”值) )。问题是动态尺寸 tf.reshape 和 tf.broa
python - tf.matmul 没有按预期工作
我尝试在 tensorflow 中编写和(逻辑运算)，有两个输入和两个权重将它们相乘得到一个数字并将这个数字加到偏差中，我在 matmul 中的问题是发送 X(输入)和 W(权重) 以方法形。[[1]
c++ - 如何优化矩阵乘法 (matmul) 代码以在单个处理器内核上快速运行
我正在研究并行编程概念并尝试优化单核上的矩阵乘法示例。到目前为止，我想出的最快的实现如下: /* This routine performs a dgemm operation * C := C
python - Tensorflow - 具有批处理数据的输入矩阵的 matmul
我有一些由 input_x 表示的数据。它是一个未知大小的张量(应该批量输入)，每个项目的大小为 n。 input_x 经历 tf.nn.embedding_lookup，因此 embed 现在具有维
python - @ 用于 matmul/inner，什么用于外积？
在 Python 中，@ 运算符传递给元素的 __matmul__ 属性。当实现一个与实际后端无关的方法时，这会派上用场。例如 def inner(x, y): return x @ y
python - "@"运算符和 np.matmul() 之间的差异
关闭。这个问题是opinion-based .它目前不接受答案。想要改进这个问题吗？更新问题，以便 editing this post 提供事实和引用来回答它. 关闭 2 年前。 Improve
tensorflow - 如何使用 tf.matmul 执行高效的稀疏矩阵乘法？
我正在尝试使用 tf.matmul() 执行稀疏矩阵乘法。但是，推理速度比密集矩阵乘法慢得多。根据 tf.sparse_matmul() 中的描述: 在一个平台上使用此乘法与密集矩阵相乘的盈亏平衡
tensorflow - TensorFlow 中没有广播 tf.matmul
我有一个我一直在努力解决的问题。与 tf.matmul() 相关并且没有广播。我在 https://github.com/tensorflow/tensorflow/issues/216 上发现了类
python - 如何向量化 scatter-matmul 运算
我有许多带有形状的矩阵 w1、w2、w3...wn (k*n1 、k*n2、k*n3...k*nn) 和 x1、x2、x3...xn 具有形状(n1*m、n2*m、n3*m...nn*m >). 我想
python - Tensorflow tf.matmul 示例不正确？
我阅读了tf.matmul的官方文档我理解第一个例子。这是一个简单的 [2,3] x [3,2] 操作: a = tf.constant([1, 2, 3, 4, 5, 6], shape=[2, 3
python - TensorFlow 中的 MatMul 问题
我正在尝试使用 4D-numpy 数组数据在 TensorFlow 中实现多层感知器我在 MatMul 函数上遇到了这个问题。我希望有人能在这里帮助我，非常感谢。 ValueError: Shape

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - TensorFlow matmul : Blas xGEMMBatched launch failed