gpt4 book ai didi

performance - 为什么我的 julia 代码运行这么慢?

转载 作者:行者123 更新时间:2023-12-04 22:19:44 26 4
gpt4 key购买 nike

redim = 2;
# Loading data
iris_data = readdlm("iris_data.csv");
iris_target = readdlm("iris_target.csv");

# Center data
iris_data = broadcast(-, iris_data, mean(iris_data, 1));
n_data, n_dim = size(iris_data);

Sw = zeros(n_dim, n_dim);
Sb = zeros(n_dim, n_dim);

C = cov(iris_data);


classes = unique(iris_target);

for i=1:length(classes)
index = find(x -> x==classes[i], iris_target);
d = iris_data[index,:];
classcov = cov(d);
Sw += length(index) / n_data .* classcov;
end
Sb = C - Sw;

evals, evecs = eig(Sw, Sb);
w = evecs[:,1:redim];
new_data = iris_data * w;

此代码仅对 iris_data 执行 LDA(线性判别分析)。
将 iris_data 的维度减少到 2。
大约需要 4 秒,但 Python(numpy/scipy) 只需要大约 0.6 秒。
为什么?

最佳答案

这是 Julia Manual 中介绍的第一页第二段:

Because Julia’s compiler is different from the interpreters used for languages like Python or R, you may find that Julia’s performance is unintuitive at first. If you find that something is slow, we highly recommend reading through the Performance Tips section before trying anything else. Once you understand how Julia works, it’s easy to write code that’s nearly as fast as C.



摘抄:

Avoid global variables

A global variable might have its value, and therefore its type, change at any point. This makes it difficult for the compiler to optimize code using global variables. Variables should be local, or passed as arguments to functions, whenever possible.

Any code that is performance critical or being benchmarked should be inside a function.

We find that global names are frequently constants, and declaring them as such greatly improves performance



知道脚本(所有程序顶级代码)样式在许多科学计算用户中如此普遍,我建议您至少将整个文件包装在 let 表达式中以供初学者使用(让我们引入一个新的本地范围),即:

let

redim = 2
# Loading data
iris_data = readdlm("iris_data.csv")
iris_target = readdlm("iris_target.csv")

# Center data
iris_data = broadcast(-, iris_data, mean(iris_data, 1))
n_data, n_dim = size(iris_data)

Sw = zeros(n_dim, n_dim)
Sb = zeros(n_dim, n_dim)

C = cov(iris_data)


classes = unique(iris_target)

for i=1:length(classes)
index = find(x -> x==classes[i], iris_target)
d = iris_data[index,:]
classcov = cov(d)
Sw += length(index) / n_data .* classcov
end
Sb = C - Sw

evals, evecs = eig(Sw, Sb)
w = evecs[:,1:redim]
new_data = iris_data * w

end
但我也建议您将其重构为小函数,然后编写一个调用其余部分的 main 函数,类似这样,请注意此重构如何使您的代码通用且可重用(且速度快):
module LinearDiscriminantAnalysis

export load_data, center_data

"Returns data and target Matrices."
load_data(data_path, target_path) = (readdlm(data_path), readdlm(target_path))

function center_data(data, target)
data = broadcast(-, data, mean(data, 1))
n_data, n_dim = size(data)
Sw = zeros(n_dim, n_dim)
Sb = zeros(n_dim, n_dim)
C = cov(data)
classes = unique(target)
for i=1:length(classes)
index = find(x -> x==classes[i], target)
d = data[index,:]
classcov = cov(d)
Sw += length(index) / n_data .* classcov
end
Sb = C - Sw
evals, evecs = eig(Sw, Sb)
redim = 2
w = evecs[:,1:redim]
return data * w
end

end

using LinearDiscriminantAnalysis

function main()
iris_data, iris_target = load_data("iris_data.csv", "iris_target.csv")
result = center_data(iris_data, iris_target)
@show result
end

main()
笔记:
  • 您不需要所有这些分号。
  • 匿名函数目前很慢,但会在 v0.5 中改变。如果性能至关重要,您现在可以使用 FastAnonymous
  • 总之,仔细阅读并考虑到 的所有 性能提示。
  • main 只是一个名称,它可以是您喜欢的任何其他名称。
  • 关于performance - 为什么我的 julia 代码运行这么慢?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34615746/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com