gpt4 book ai didi

fortran - LAPACK:对打包存储矩阵的操作是否更快?

转载 作者:行者123 更新时间:2023-12-04 03:59:51 37 4
gpt4 key购买 nike

我想使用Fortran和LAPACK对角化一个实对称矩阵。 LAPACK基本上提供了两个例程,一个例程在完整矩阵上运行,另一个例程在打包存储中的矩阵上运行。虽然后者肯定会使用较少的内存,但我想知道关于速度差异是否可以说些什么?

最佳答案

当然,这是一个经验性的问题:但是总的来说,没有什么是免费的,更少的内存/更多的运行时间是很常见的折衷方案。

在这种情况下,对于打包的情况,数据的索引会更复杂,因此在遍历矩阵时,获取数据的成本会稍高一些。 (使这张图复杂的是对称矩阵,lapack例程还假定某种打包-您只拥有可用矩阵的上层或下层组件)。

今天早些时候我在处理一个本征问题,所以我将其用作度量基准。尝试使用简单的对称测试用例(Herdon矩阵,来自http://people.sc.fsu.edu/~jburkardt/m_src/test_mat/test_mat.html),并将ssyevdsspevd进行比较

$ ./eigen2 500
Generating a Herdon matrix:
Unpacked array:
Eigenvalues L_infty err = 1.7881393E-06
Packed array:
Eigenvalues L_infty err = 3.0994415E-06
Packed time: 2.800000086426735E-002
Unpacked time: 2.500000037252903E-002

$ ./eigen2 1000
Generating a Herdon matrix:
Unpacked array:
Eigenvalues L_infty err = 4.5299530E-06
Packed array:
Eigenvalues L_infty err = 5.8412552E-06
Packed time: 0.193900004029274
Unpacked time: 0.165000006556511

$ ./eigen2 2500
Generating a Herdon matrix:
Unpacked array:
Eigenvalues L_infty err = 6.1988831E-06
Packed array:
Eigenvalues L_infty err = 8.4638596E-06
Packed time: 3.21040010452271
Unpacked time: 2.70149993896484

大约有18%的差异,我必须承认这个差异要比我预期的要大(对于带包装的包装箱,误差也略大吗?)。这是intel的MKL。当然,性能差异将取决于您的矩阵,当然,正如令人毛骨悚然的指出的那样,还取决于您正在执行的问题。您对矩阵所做的随机访问越多,开销将越差。我使用的代码如下:
program eigens
implicit none

integer :: nargs,n ! problem size
real, dimension(:,:), allocatable :: A, B, Z
real, dimension(:), allocatable :: PA
real, dimension(:), allocatable :: work
integer, dimension(:), allocatable :: iwork
real, dimension(:), allocatable :: eigenvals, expected
real :: c, p
integer :: worksize, iworksize
character(len=100) :: nstr
integer :: unpackedclock, packedclock
double precision :: unpackedtime, packedtime
integer :: i,j,info

! get filename
nargs = command_argument_count()
if (nargs /= 1) then
print *,'Usage: eigen2 n'
print *,' Where n = size of array'
stop
endif
call get_command_argument(1, nstr)
read(nstr,'(I)') n
if (n < 4 .or. n > 25000) then
print *, 'Invalid n ', nstr
stop
endif


! Initialize local arrays

allocate(A(n,n),B(n,n))
allocate(eigenvals(n))

! calculate the matrix - unpacked

print *, 'Generating a Herdon matrix: '

A = 0.
c = (1.*n * (1.*n + 1.) * (2.*n - 5.))/6.
forall (i=1:n-1,j=1:n-1)
A(i,j) = -1.*i*j/c
endforall
forall (i=1:n-1)
A(i,i) = (c - 1.*i*i)/c
A(i,n) = 1.*i/c
endforall
forall (j=1:n-1)
A(n,j) = 1.*j/c
endforall
A(n,n) = -1./c
B = A

! expected eigenvalues
allocate(expected(n))
p = 3. + sqrt((4. * n - 3.) * (n - 1.)*3./(n+1.))
expected(1) = p/(n*(5.-2.*n))
expected(2) = 6./(p*(n+1.))
expected(3:n) = 1.

print *, 'Unpacked array:'
allocate(work(1),iwork(1))
call ssyevd('N','U',n,A,n,eigenvals,work,-1,iwork,-1,info)
worksize = int(work(1))
iworksize = int(work(1))
deallocate(work,iwork)
allocate(work(worksize),iwork(iworksize))

call tick(unpackedclock)
call ssyevd('N','U',n,A,n,eigenvals,work,worksize,iwork,iworksize,info)
unpackedtime = tock(unpackedclock)
deallocate(work,iwork)

if (info /= 0) then
print *, 'Error -- info = ', info
endif
print *,'Eigenvalues L_infty err = ', maxval(eigenvals-expected)


! pack array

print *, 'Packed array:'
allocate(PA(n*(n+1)/2))
allocate(Z(n,n))
do i=1,n
do j=i,n
PA(i+(j-1)*j/2) = B(i,j)
enddo
enddo

allocate(work(1),iwork(1))
call sspevd('N','U',n,PA,eigenvals,Z,n,work,-1,iwork,-1,info)
worksize = int(work(1))
iworksize = iwork(1)
deallocate(work,iwork)
allocate(work(worksize),iwork(iworksize))

call tick(packedclock)
call sspevd('N','U',n,PA,eigenvals,Z,n,work,worksize,iwork,iworksize,info)
packedtime = tock(packedclock)
deallocate(work,iwork)
deallocate(Z,A,B,PA)

if (info /= 0) then
print *, 'Error -- info = ', info
endif
print *,'Eigenvalues L_infty err = ', &
maxval(eigenvals-expected)

deallocate(eigenvals, expected)


print *,'Packed time: ', packedtime
print *,'Unpacked time: ', unpackedtime


contains
subroutine tick(t)
integer, intent(OUT) :: t

call system_clock(t)
end subroutine tick

! returns time in seconds from now to time described by t
real function tock(t)
integer, intent(in) :: t
integer :: now, clock_rate

call system_clock(now,clock_rate)

tock = real(now - t)/real(clock_rate)
end function tock

end program eigens

关于fortran - LAPACK:对打包存储矩阵的操作是否更快?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8941678/

37 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com