gpt4 book ai didi

python - 纯 Tensorflow 中的 Gram-Schmidt 正交化 : performance for iterative solution is much slower than numpy

转载 作者:太空宇宙 更新时间:2023-11-03 14:18:12 25 4
gpt4 key购买 nike

我想做 Gram-Schmidt 正交化来修复大矩阵,这些矩阵开始稍微偏离纯 Tensorflow 中的正交性(在更大的计算中在图上进行,而不破坏它)。我见过的解决方案like the one there被“外部”使用(在内部执行多个 sess.run)。

所以我自己写了一个简单但我认为效率很低的实现:

def tf_gram_schmidt(vectors):
# add batch dimension for matmul
basis = tf.expand_dims(vectors[0,:]/tf.norm(vectors[0,:]),0)
for i in range(1,vectors.get_shape()[0].value):
v = vectors[i,:]
# add batch dimension for matmul
v = tf.expand_dims(v,0)
w = v - tf.matmul(tf.matmul(v, tf.transpose(basis)), basis)
# I assume that my matrix is close to orthogonal
basis = tf.concat([basis, w/tf.norm(w)],axis=0)
return basis

但是当我将它与相同的迭代外部代码进行比较时,它慢了 3 倍(在 GPU 上!!!)(尽管精度更高一点):

how much source differs from orthogonal matrix:
44.7176
tensorflow version:
0.034667
Time elapsed: 23365.9820557ms
numpy version with tensorflow and variable re-assign to the result of numpy code:
0.057589
Time elapsed: 8540.5600071ms

(UPD 4:我在示例中犯了一个小错误,但它根本没有改变时间,因为 ort_discrepancy() 是一个轻量级函数):

最小示例:

import tensorflow as tf

import numpy as np

import time

# found this code somewhere on stackoverflow
def np_gram_schmidt(vectors):
basis = []
for v in vectors:
w = v - np.sum( np.dot(v,b)*b for b in basis )
if (w > 1e-10).any():
basis.append(w/np.linalg.norm(w))
else:
basis.append(np.zeros(w.shape))
return np.array(basis)



def tf_gram_schmidt(vectors):
# add batch dimension for matmul
basis = tf.expand_dims(vectors[0,:]/tf.norm(vectors[0,:]),0)
for i in range(1,vectors.get_shape()[0].value):
v = vectors[i,:]
# add batch dimension for matmul
v = tf.expand_dims(v,0)
w = v - tf.matmul(tf.matmul(v, tf.transpose(basis)), basis)
# I assume that my matrix is close to orthogonal
basis = tf.concat([basis, w/tf.norm(w)],axis=0)
return basis





# how much matrix differs from orthogonal
# computes ||W*W^T - I||2
def ort_discrepancy(matrix):
wwt = tf.matmul(matrix, matrix, transpose_a=True)
rows = tf.shape(wwt)[0]
cols = tf.shape(wwt)[1]
return tf.norm((wwt - tf.eye(rows,cols)),ord='euclidean')


np.random.seed(0)
# white noise matrix
np_nearly_orthogonal = np.random.normal(size=(2000,2000))
# centered rows
np_nearly_orthogonal = np.array([row/np.linalg.norm(row) for row in np_nearly_orthogonal])


tf_nearly_orthogonal = tf.Variable(np_nearly_orthogonal,dtype=tf.float32)


init = tf.global_variables_initializer()



with tf.Session() as sess:
sess.run(init)

print("how much source differs from orthogonal matrix:")
print(ort_discrepancy(tf_nearly_orthogonal).eval())

print("tensorflow version:")
start = time.time()

print(ort_discrepancy(tf_gram_schmidt(tf_nearly_orthogonal)).eval())

end = time.time()
print("Time elapsed: %sms"%(1000*(end-start)))

print("numpy version with tensorflow and variable re-assign to the result of numpy code:")
start = time.time()

tf_nearly_orthogonal = tf.Variable(np_gram_schmidt(tf_nearly_orthogonal.eval()),dtype=tf.float32)
sess.run(tf.variables_initializer([tf_nearly_orthogonal]))



# check that variable was updated
print(ort_discrepancy(tf_nearly_orthogonal).eval())
end = time.time()
print("Time elapsed: %sms"%(1000*(end-start)))

有什么办法可以加快速度吗?我不知道如何为 G-S 执行此操作,这需要附加到基础(因此没有 tf.map_fn 并行化可以提供帮助)。

UPD:我通过优化 tf.matmul 实现了 2 倍的差异:

def tf_gram_schmidt(vectors):
# add batch dimension for matmul
basis = tf.expand_dims(vectors[0,:]/tf.norm(vectors[0,:]),0)
for i in range(1,vectors.get_shape()[0].value):
v = vectors[i,:]
# add batch dimension for matmul
v = tf.expand_dims(v,0)
w = v - tf.matmul(tf.matmul(v, basis, transpose_b=True), basis)
# I assume that my matrix is close to orthogonal
basis = tf.concat([basis, w/tf.norm(w)],axis=0)
return basis





how much source differs from orthogonal matrix:
44.7176
tensorflow version:
0.0335421
Time elapsed: 17004.458189ms
numpy version with tensorflow and variable re-assign to the result of numpy code:
0.057589
Time elapsed: 8082.20791817ms

编辑2:

只是为了好玩,尝试完全模仿 numpy 解决方案,并得到了非常长的工作代码:

def tf_gram_schmidt(vectors):
# add batch dimension for matmul
basis = tf.expand_dims(vectors[0,:]/tf.norm(vectors[0,:]),0)
for i in range(1,vectors.get_shape()[0].value):

v = vectors[i,:]
# like in numpy example
multiplied = tf.reduce_sum(tf.map_fn(lambda b: tf.scalar_mul(tf.tensordot(v,b,axes=[[0],[0]]),b), basis), axis=0)
w = v - multiplied



## add batch dimension for matmul
##v = tf.expand_dims(v,0)
##w = v - tf.matmul(tf.matmul(v, basis, transpose_b=True), basis)

# I assume that my matrix is close to orthogonal
basis = tf.concat([basis, tf.expand_dims(w/tf.norm(w),0)],axis=0)
return basis

(这似乎也溢出了 GPU 内存):

how much source differs from orthogonal matrix:
44.7176
tensorflow version:
2018-01-05 22:12:09.854505: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 14005 get requests, put_count=5105 evicted_count=1000 eviction_rate=0.195886 and unsatisfied allocation rate=0.714031
2018-01-05 22:12:09.854530: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
2018-01-05 22:12:13.090296: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 308520 get requests, put_count=314261 evicted_count=6000 eviction_rate=0.0190924 and unsatisfied allocation rate=0.00088487
2018-01-05 22:12:22.270822: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1485113 get requests, put_count=1500399 evicted_count=16000 eviction_rate=0.0106638 and unsatisfied allocation rate=0.000490198
2018-01-05 22:12:37.833056: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3484575 get requests, put_count=3509407 evicted_count=26000 eviction_rate=0.00740866 and unsatisfied allocation rate=0.000339209
2018-01-05 22:12:59.995184: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 6315546 get requests, put_count=6349923 evicted_count=36000 eviction_rate=0.00566936 and unsatisfied allocation rate=0.000259202
0.0290728
Time elapsed: 136108.97398ms
numpy version with tensorflow and variable re-assign to the result of numpy code:
0.057589
Time elapsed: 10618.8428402ms

UPD3:我的 GPU 是 GTX1050,它通常比我的 CPU 加速 5-7 倍。所以结果对我来说很奇怪。

UPD5:好的,我发现这段代码几乎没有使用 GPU,而使用手动编写的反向传播来训练神经网络,其中使用了大量 tf.matmul 和其他矩阵算术,充分利用它。为什么会这样?

<小时/>

更新6:

根据给定的建议,我以新的方式测量了时间:

# Akshay's suggestion to measure performance correclty
orthogonalized = ort_discrepancy(tf_gram_schmidt(tf_nearly_orthogonal))

with tf.Session() as sess:
sess.run(init)

print("how much source differs from orthogonal matrix:")
print(ort_discrepancy(tf_nearly_orthogonal).eval())

print("tensorflow version:")
start = time.time()

tf_result = sess.run(orthogonalized)

end = time.time()

print(tf_result)

print("Time elapsed: %sms"%(1000*(end-start)))

print("numpy version with tensorflow and variable re-assign to the result of numpy code:")
start = time.time()

tf_nearly_orthogonal = tf.Variable(np_gram_schmidt(tf_nearly_orthogonal.eval()),dtype=tf.float32)
sess.run(tf.variables_initializer([tf_nearly_orthogonal]))



# check that variable was updated
print(ort_discrepancy(tf_nearly_orthogonal).eval())

end = time.time()
print("Time elapsed: %sms"%(1000*(end-start)))

现在我可以看到 4 倍的加速:

how much source differs from orthogonal matrix:
44.7176
tensorflow version:
0.018951
Time elapsed: 2594.85888481ms
numpy version with tensorflow and variable re-assign to the result of numpy code:
0.057589
Time elapsed: 8851.86600685ms

最佳答案

TensorFlow 看起来很慢,因为您的基准测试正在测量构建图的时间以及执行图所需的时间; TensorFlow 和 NumPy 之间更公平的比较会将图构建排除在基准测试之外。特别是,您的基准测试可能应该如下所示:

print("tensorflow version:")
# This line constructs the graph but does not execute it.
orthogonalized = ort_discrepancy(tf_gram_schmidt(tf_nearly_orthogonal))

start = time.time()
tf_result = sess.run(orthogonalized)
end = time.time()

关于python - 纯 Tensorflow 中的 Gram-Schmidt 正交化 : performance for iterative solution is much slower than numpy,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48119473/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com