gpt4 book ai didi

python - TF.Keras model.predict 比直接 Numpy 慢?

转载 作者:行者123 更新时间:2023-12-03 17:06:09 25 4
gpt4 key购买 nike

谢谢大家试图帮助我理解下面的问题。我已经更新了问题并生成了 CPU-only 运行和 GPU-only 运行。通常,在任何一种情况下,直接 numpy 计算似乎都比 model. predict() 快数百倍。希望这可以澄清这似乎不是 CPU GPU 问题(如果是,我希望得到解释)。

让我们用 keras 创建一个训练有素的模型。

import tensorflow as tf

(X,Y),(Xt,Yt) = tf.keras.datasets.mnist.load_data()

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1000,'relu'),
tf.keras.layers.Dense(100,'relu'),
tf.keras.layers.Dense(10,'softmax'),
])
model.compile('adam','sparse_categorical_crossentropy')
model.fit(X,Y,epochs=20,batch_size=1024)
现在让我们使用 model.predict 重新创建 numpy 函数。
import numpy as np

W = model.get_weights()

def predict(X):
X = X.reshape((X.shape[0],-1)) #Flatten
X = X @ W[0] + W[1] #Dense
X[X<0] = 0 #Relu
X = X @ W[2] + W[3] #Dense
X[X<0] = 0 #Relu
X = X @ W[4] + W[5] #Dense
X = np.exp(X)/np.exp(X).sum(1)[...,None] #Softmax
return X
我们可以很容易地验证这些是相同的功能(实现中的模块机器错误)。
print(model.predict(X[:100]).argmax(1))
print(predict(X[:100]).argmax(1))
我们还可以测试这些函数的运行速度。使用 ipython :
%timeit model.predict(X[:10]).argmax(1) # 10   loops takes 37.7 ms
%timeit predict(X[:10]).argmax(1) # 1000 loops takes 356 µs
我知道 predict 在小批量下的运行速度比 model. predict 10,000 倍,而在大批量下,它的运行速度降低到 100 倍左右。无论如何,为什么 predict 这么快?事实上, predict 甚至没有优化,我们可以使用 numba ,甚至直接在 predict 代码中重新编写 C 并编译它。
考虑部署目的,为什么手动从模型中提取权重并重新编写函数比 keras 内部执行的速度快数千倍?这也意味着编写脚本以利用 .h5 文件或类似文件,可能比手动重写预测函数要慢得多。一般来说,这是真的吗?

Ipython 输出(CPU):
Python 3.8.5 (default, Sep  3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.19.0
Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] on win32
import os
os.environ["CUDA_VISIBLE_DEVICES"]="-1"
import tensorflow as tf
(X,Y),(Xt,Yt) = tf.keras.datasets.mnist.load_data()
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1000,'relu'),
tf.keras.layers.Dense(100,'relu'),
tf.keras.layers.Dense(10,'softmax'),
])
model.compile('adam','sparse_categorical_crossentropy')
model.fit(X,Y,epochs=20,batch_size=1024)
2021-04-19 15:10:58.323137: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-19 15:11:01.990590: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-04-19 15:11:02.039285: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-04-19 15:11:02.042553: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: DESKTOP-G0U8S3P
2021-04-19 15:11:02.043134: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-G0U8S3P
2021-04-19 15:11:02.128834: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:127] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/20
59/59 [==============================] - 4s 60ms/step - loss: 35.3708
Epoch 2/20
59/59 [==============================] - 3s 58ms/step - loss: 0.8671
Epoch 3/20
59/59 [==============================] - 3s 56ms/step - loss: 0.5641
Epoch 4/20
59/59 [==============================] - 3s 56ms/step - loss: 0.4359
Epoch 5/20
59/59 [==============================] - 3s 56ms/step - loss: 0.3447
Epoch 6/20
59/59 [==============================] - 3s 56ms/step - loss: 0.2891
Epoch 7/20
59/59 [==============================] - 3s 56ms/step - loss: 0.2371
Epoch 8/20
59/59 [==============================] - 3s 57ms/step - loss: 0.1977
Epoch 9/20
59/59 [==============================] - 3s 57ms/step - loss: 0.1713
Epoch 10/20
59/59 [==============================] - 3s 57ms/step - loss: 0.1381
Epoch 11/20
59/59 [==============================] - 4s 61ms/step - loss: 0.1203
Epoch 12/20
59/59 [==============================] - 3s 57ms/step - loss: 0.1095
Epoch 13/20
59/59 [==============================] - 3s 56ms/step - loss: 0.0877
Epoch 14/20
59/59 [==============================] - 3s 57ms/step - loss: 0.0793
Epoch 15/20
59/59 [==============================] - 3s 56ms/step - loss: 0.0727
Epoch 16/20
59/59 [==============================] - 3s 56ms/step - loss: 0.0702
Epoch 17/20
59/59 [==============================] - 3s 56ms/step - loss: 0.0701
Epoch 18/20
59/59 [==============================] - 3s 57ms/step - loss: 0.0631
Epoch 19/20
59/59 [==============================] - 3s 56ms/step - loss: 0.0539
Epoch 20/20
59/59 [==============================] - 3s 58ms/step - loss: 0.0493
Out[3]: <tensorflow.python.keras.callbacks.History at 0x143069fdf40>

import numpy as np
W = model.get_weights()
def predict(X):
X = X.reshape((X.shape[0],-1)) #Flatten
X = X @ W[0] + W[1] #Dense
X[X<0] = 0 #Relu
X = X @ W[2] + W[3] #Dense
X[X<0] = 0 #Relu
X = X @ W[4] + W[5] #Dense
X = np.exp(X)/np.exp(X).sum(1)[...,None] #Softmax
return X

%timeit model.predict(X[:10]).argmax(1) # 10 loops takes 37.7 ms
%timeit predict(X[:10]).argmax(1) # 1000 loops takes 356 µs

52.8 ms ± 2.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
640 µs ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Ipython 输出(GPU):
Python 3.7.7 (default, Mar 26 2020, 15:48:22) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import tensorflow as tf
...:
...: (X,Y),(Xt,Yt) = tf.keras.datasets.mnist.load_data()
...:
...: model = tf.keras.models.Sequential([
...: tf.keras.layers.Flatten(),
...: tf.keras.layers.Dense(1000,'relu'),
...: tf.keras.layers.Dense(100,'relu'),
...: tf.keras.layers.Dense(10,'softmax'),
...: ])
...: model.compile('adam','sparse_categorical_crossentropy')
...: model.fit(X,Y,epochs=20,batch_size=1024)
2020-07-01 15:50:46.008518: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-01 15:50:46.054495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:05:00.0
2020-07-01 15:50:46.059582: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-07-01 15:50:46.114562: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-07-01 15:50:46.142058: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-07-01 15:50:46.152899: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-07-01 15:50:46.217725: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-07-01 15:50:46.260758: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-07-01 15:50:46.374328: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-07-01 15:50:46.376747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-07-01 15:50:46.377688: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX FMA
2020-07-01 15:50:46.433422: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 4018875000 Hz
2020-07-01 15:50:46.434383: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x563e4d0d71c0 executing computations on platform Host. Devices:
2020-07-01 15:50:46.435119: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2020-07-01 15:50:46.596077: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x563e4a9379f0 executing computations on platform CUDA. Devices:
2020-07-01 15:50:46.596119: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-07-01 15:50:46.597894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:05:00.0
2020-07-01 15:50:46.597961: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-07-01 15:50:46.597988: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-07-01 15:50:46.598014: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-07-01 15:50:46.598040: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-07-01 15:50:46.598065: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-07-01 15:50:46.598090: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-07-01 15:50:46.598115: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-07-01 15:50:46.599766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-07-01 15:50:46.600611: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-07-01 15:50:46.603713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-01 15:50:46.603751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-07-01 15:50:46.603763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-07-01 15:50:46.605917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10311 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:05:00.0, compute capability: 7.5)
Train on 60000 samples
Epoch 1/20
2020-07-01 15:50:49.995091: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
60000/60000 [==============================] - 2s 26us/sample - loss: 9.9370
Epoch 2/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.6094
Epoch 3/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.3672
Epoch 4/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.2720
Epoch 5/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.2196
Epoch 6/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.1673
Epoch 7/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.1367
Epoch 8/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.1082
Epoch 9/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.0895
Epoch 10/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.0781
Epoch 11/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.0666
Epoch 12/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.0537
Epoch 13/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.0459
Epoch 14/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.0412
Epoch 15/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.0401
Epoch 16/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.0318
Epoch 17/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.0275
Epoch 18/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.0237
Epoch 19/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.0212
Epoch 20/20
60000/60000 [==============================] - 0s 4us/sample - loss: 0.0199
Out[1]: <tensorflow.python.keras.callbacks.History at 0x7f7c9000b550>

In [2]: import numpy as np
...:
...: W = model.get_weights()
...:
...: def predict(X):
...: X = X.reshape((X.shape[0],-1)) #Flatten
...: X = X @ W[0] + W[1] #Dense
...: X[X<0] = 0 #Relu
...: X = X @ W[2] + W[3] #Dense
...: X[X<0] = 0 #Relu
...: X = X @ W[4] + W[5] #Dense
...: X = np.exp(X)/np.exp(X).sum(1)[...,None] #Softmax
...: return X
...:

In [3]: print(model.predict(X[:100]).argmax(1))
...: print(predict(X[:100]).argmax(1))
[5 0 4 1 9 2 1 3 1 4 3 5 3 6 1 7 2 8 6 9 4 0 9 1 1 2 4 3 2 7 3 8 6 9 0 5 6
0 7 6 1 8 7 9 3 9 8 5 9 3 3 0 7 4 9 8 0 9 4 1 4 4 6 0 4 5 6 1 0 0 1 7 1 6
3 0 2 1 1 7 5 0 2 6 7 8 3 9 0 4 6 7 4 6 8 0 7 8 3 1]
/home/bobbyocean/anaconda3/bin/ipython3:12: RuntimeWarning: overflow encountered in exp
/home/bobbyocean/anaconda3/bin/ipython3:12: RuntimeWarning: invalid value encountered in true_divide
[5 0 4 1 9 2 1 3 1 4 3 5 3 6 1 7 2 8 6 9 4 0 9 1 1 2 4 3 2 7 3 8 6 9 0 5 6
0 7 6 1 8 7 9 3 9 8 5 9 3 3 0 7 4 9 8 0 9 4 1 4 4 6 0 4 5 6 1 0 0 1 7 1 6
3 0 2 1 1 7 5 0 2 6 7 8 3 9 0 4 6 7 4 6 8 0 7 8 3 1]

In [4]: %timeit model.predict(X[:10]).argmax(1)
37.7 ms ± 806 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: %timeit predict(X[:10]).argmax(1)
361 µs ± 13.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

最佳答案

我们观察到 主要问题是 Eager Execution 的原因模式 .我们根据 对您的代码和相应的结果进行粗略的了解。中央处理器 显卡 基地。确实numpy不适用于 显卡 , 所以不像 tf-gpu ,它不会遇到任何数据转移开销。
但它也是 很明显您定义的 predict 完成了多少快速计算使用 np 的方法比较 model. predicttf. keras ,而输入测试集是 仅 10 个 sample .但是,我们没有做任何深入的分析,就像一件艺术品 here你可能喜欢阅读。

我的设置如下。我正在使用 Colab 环境和检查中央处理器 显卡 模式。

TensorFlow 1.15.2
Keras 2.3.1
Numpy 1.19.5

TensorFlow 2.4.1
Keras 2.4.0
Numpy 1.19.5

TF 1.15.2 - 中央处理器
%tensorflow_version 1.x

import os
os.environ["CUDA_VISIBLE_DEVICES"]="-1"

import tensorflow as tf
from tensorflow.python.client import device_lib

print(tf.__version__)
print('A: ', tf.test.is_built_with_cuda)
print('B: ', tf.test.gpu_device_name())
local_device_protos = device_lib.list_local_devices()
([x.name for x in local_device_protos if x.device_type == 'GPU'],
[x.name for x in local_device_protos if x.device_type == 'CPU'])
TensorFlow 1.x selected.
1.15.2
A: <function is_built_with_cuda at 0x7f122d58dcb0>
B:
([], ['/device:CPU:0'])
现在,运行您的代码。
import tensorflow as tf
import keras
print(tf.executing_eagerly()) # False

(X,Y),(Xt,Yt) = keras.datasets.mnist.load_data()
model = keras.models.Sequential([])
model.compile
model.fit

%timeit model.predict(X[:10]).argmax(1) # yours: 10 loops takes 37.7 ms
%timeit predict(X[:10]).argmax(1) # yours: 1000 loops takes 356 µs

1000 loops, best of 5: 1.07 ms per loop
1000 loops, best of 5: 1.48 ms per loop
我们可以看到执行时间是 可比 与旧 keras .现在,让我们用 进行测试显卡 还。

TF 1.15.2 - GPU
%tensorflow_version 1.x

import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

import tensorflow as tf
from tensorflow.python.client import device_lib

print(tf.__version__)
print('A: ', tf.test.is_built_with_cuda)
print('B: ', tf.test.gpu_device_name())
local_device_protos = device_lib.list_local_devices()
([x.name for x in local_device_protos if x.device_type == 'GPU'],
[x.name for x in local_device_protos if x.device_type == 'CPU'])
1.15.2
A: <function is_built_with_cuda at 0x7f0b5ad46830>
B: /device:GPU:0
(['/device:GPU:0'], ['/device:CPU:0'])
...
...
%timeit model.predict(X[:10]).argmax(1) # yours: 10 loops takes 37.7 ms
%timeit predict(X[:10]).argmax(1) # yours: 1000 loops takes 356 µs

1000 loops, best of 5: 1.02 ms per loop
1000 loops, best of 5: 1.44 ms per loop
现在,这里的执行时间也与旧的 keras 相当。并且没有渴望模式。现在让我们看看新的 tf. keras首先使用 Eager 模式,然后我们在没有 Eager 模式的情况下进行观察。

TF 2.4.1 - CPU
热切
import os
os.environ["CUDA_VISIBLE_DEVICES"]="-1"

import tensorflow as tf
from tensorflow.python.client import device_lib

print(tf.__version__)
print('A: ', tf.test.is_built_with_cuda)
print('B: ', tf.test.gpu_device_name())
local_device_protos = device_lib.list_local_devices()
([x.name for x in local_device_protos if x.device_type == 'GPU'],
[x.name for x in local_device_protos if x.device_type == 'CPU'])
2.4.1
A: <function is_built_with_cuda at 0x7fed85de3560>
B:
([], ['/device:CPU:0'])
现在,以 Eager 模式运行代码。
import tensorflow as tf
import keras

print(tf.executing_eagerly()) # True
(X,Y),(Xt,Yt) = keras.datasets.mnist.load_data()

model = keras.models.Sequential([ ])
model.compile
model.fit

%timeit model.predict(X[:10]).argmax(1) # yours: 10 loops takes 37.7 ms
%timeit predict(X[:10]).argmax(1) # yours: 1000 loops takes 356 µs

10 loops, best of 5: 28 ms per loop
1000 loops, best of 5: 1.73 ms per loop
急切禁用
现在,如果我们禁用 Eager 模式并运行以下相同的代码,那么我们将得到:
import tensorflow as tf
import keras

# # Disables eager execution
tf.compat.v1.disable_eager_execution()
# or,
# Disables eager execution of tf.functions.
# tf.config.run_functions_eagerly(False)
print(tf.executing_eagerly())
False
(X,Y),(Xt,Yt) = keras.datasets.mnist.load_data()
model = keras.models.Sequential([])
model.compile
model.fit

%timeit model.predict(X[:10]).argmax(1) # yours: 10 loops takes 37.7 ms
%timeit predict(X[:10]).argmax(1) # yours: 1000 loops takes 356 µs

1000 loops, best of 5: 1.37 ms per loop
1000 loops, best of 5: 1.57 ms per loop
现在,我们可以看到执行时间是 可比 在新的 tf. keras 中禁用急切模式.现在,让我们用 进行测试显卡 模式也。

TF 2.4.1 - GPU
热切
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

import tensorflow as tf
from tensorflow.python.client import device_lib

print(tf.__version__)
print('A: ', tf.test.is_built_with_cuda)
print('B: ', tf.test.gpu_device_name())
local_device_protos = device_lib.list_local_devices()
([x.name for x in local_device_protos if x.device_type == 'GPU'],
[x.name for x in local_device_protos if x.device_type == 'CPU'])
2.4.1
A: <function is_built_with_cuda at 0x7f16ad88f680>
B: /device:GPU:0
(['/device:GPU:0'], ['/device:CPU:0'])
import tensorflow as tf
import keras

print(tf.executing_eagerly()) # True
(X,Y),(Xt,Yt) = keras.datasets.mnist.load_data()
model = keras.models.Sequential([ ])
model.compile
model.fit

%timeit model.predict(X[:10]).argmax(1) # yours: 10 loops takes 37.7 ms
%timeit predict(X[:10]).argmax(1) # yours: 1000 loops takes 356 µs

10 loops, best of 5: 26.3 ms per loop
1000 loops, best of 5: 1.48 ms per loop
急切禁用
最后,如果我们禁用 Eager 模式并运行以下相同的代码,我们将得到:
# Disables eager execution
tf.compat.v1.disable_eager_execution()
# or,
# Disables eager execution of tf.functions.
# tf.config.run_functions_eagerly(False)
print(tf.executing_eagerly()) # False

(X,Y),(Xt,Yt) = keras.datasets.mnist.load_data()
model = keras.models.Sequential([ ])
model.compile
model.fit

%timeit model.predict(X[:10]).argmax(1) # yours: 10 loops takes 37.7 ms
%timeit predict(X[:10]).argmax(1) # yours: 1000 loops takes 356 µs

1000 loops, best of 5: 1.12 ms per loop
1000 loops, best of 5: 1.45 ms per loop
和以前一样,执行时间是 可比 在新的 tf. keras 中使用非急切模式.这就是为什么, 渴望模式tf. keras 性能变慢的根本原因比直 numpy .

关于python - TF.Keras model.predict 比直接 Numpy 慢?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62681257/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com