gpt4 book ai didi

python - 使用 memory_profiler 分析代码会增加执行时间

转载 作者:太空宇宙 更新时间:2023-11-04 04:41:05 25 4
gpt4 key购买 nike

我正在编写一个简单的应用程序,它将一个大的文本文件拆分成多个较小的文件,我已经编写了它的两个版本,一个使用列表,一个使用生成器。我使用 memory_profiler 模块分析了两个版本,它清楚地显示了生成器版本更好的内存效率,但是奇怪的是,当分析使用生成器的版本时,它增加了执行时间。下面的演示解释了我的意思

使用列表的版本

from memory_profiler import profile


@profile()
def main():
file_name = input("Enter the full path of file you want to split into smaller inputFiles: ")
input_file = open(file_name).readlines()
num_lines_orig = len(input_file)
parts = int(input("Enter the number of parts you want to split in: "))
output_files = [(file_name + str(i)) for i in range(1, parts + 1)]
st = 0
p = int(num_lines_orig / parts)
ed = p
for i in range(parts-1):
with open(output_files[i], "w") as OF:
OF.writelines(input_file[st:ed])
st = ed
ed = st + p

with open(output_files[-1], "w") as OF:
OF.writelines(input_file[st:])


if __name__ == "__main__":
main()

当使用分析器运行时

$ time py36 Splitting\ text\ files_BAD_usingLists.py                                                                                                               

Enter the full path of file you want to split into smaller inputFiles: /apps/nttech/rbhanot/Downloads/test.txt
Enter the number of parts you want to split in: 3
Filename: Splitting text files_BAD_usingLists.py

Line # Mem usage Increment Line Contents
================================================
6 47.8 MiB 0.0 MiB @profile()
7 def main():
8 47.8 MiB 0.0 MiB file_name = input("Enter the full path of file you want to split into smaller inputFiles: ")
9 107.3 MiB 59.5 MiB input_file = open(file_name).readlines()
10 107.3 MiB 0.0 MiB num_lines_orig = len(input_file)
11 107.3 MiB 0.0 MiB parts = int(input("Enter the number of parts you want to split in: "))
12 107.3 MiB 0.0 MiB output_files = [(file_name + str(i)) for i in range(1, parts + 1)]
13 107.3 MiB 0.0 MiB st = 0
14 107.3 MiB 0.0 MiB p = int(num_lines_orig / parts)
15 107.3 MiB 0.0 MiB ed = p
16 108.1 MiB 0.7 MiB for i in range(parts-1):
17 107.6 MiB -0.5 MiB with open(output_files[i], "w") as OF:
18 108.1 MiB 0.5 MiB OF.writelines(input_file[st:ed])
19 108.1 MiB 0.0 MiB st = ed
20 108.1 MiB 0.0 MiB ed = st + p
21
22 108.1 MiB 0.0 MiB with open(output_files[-1], "w") as OF:
23 108.1 MiB 0.0 MiB OF.writelines(input_file[st:])



real 0m6.115s
user 0m0.764s
sys 0m0.052s

在没有分析器的情况下运行

$ time py36 Splitting\ text\ files_BAD_usingLists.py 
Enter the full path of file you want to split into smaller inputFiles: /apps/nttech/rbhanot/Downloads/test.txt
Enter the number of parts you want to split in: 3

real 0m5.916s
user 0m0.696s
sys 0m0.080s

现在使用发电机

@profile()
def main():
file_name = input("Enter the full path of file you want to split into smaller inputFiles: ")
input_file = open(file_name)
num_lines_orig = sum(1 for _ in input_file)
input_file.seek(0)
parts = int(input("Enter the number of parts you want to split in: "))
output_files = ((file_name + str(i)) for i in range(1, parts + 1))
st = 0
p = int(num_lines_orig / parts)
ed = p
for i in range(parts-1):
file = next(output_files)
with open(file, "w") as OF:
for _ in range(st, ed):
OF.writelines(input_file.readline())

st = ed
ed = st + p
if num_lines_orig - ed < p:
ed = st + (num_lines_orig - ed) + p
else:
ed = st + p

file = next(output_files)
with open(file, "w") as OF:
for _ in range(st, ed):
OF.writelines(input_file.readline())


if __name__ == "__main__":
main()

当使用探查器选项运行时

$ time py36 -m memory_profiler Splitting\ text\ files_GOOD_usingGenerators.py                                                                                                                                      
Enter the full path of file you want to split into smaller inputFiles: /apps/nttech/rbhanot/Downloads/test.txt
Enter the number of parts you want to split in: 3
Filename: Splitting text files_GOOD_usingGenerators.py

Line # Mem usage Increment Line Contents
================================================
4 47.988 MiB 0.000 MiB @profile()
5 def main():
6 47.988 MiB 0.000 MiB file_name = input("Enter the full path of file you want to split into smaller inputFiles: ")
7 47.988 MiB 0.000 MiB input_file = open(file_name)
8 47.988 MiB 0.000 MiB num_lines_orig = sum(1 for _ in input_file)
9 47.988 MiB 0.000 MiB input_file.seek(0)
10 47.988 MiB 0.000 MiB parts = int(input("Enter the number of parts you want to split in: "))
11 48.703 MiB 0.715 MiB output_files = ((file_name + str(i)) for i in range(1, parts + 1))
12 47.988 MiB -0.715 MiB st = 0
13 47.988 MiB 0.000 MiB p = int(num_lines_orig / parts)
14 47.988 MiB 0.000 MiB ed = p
15 48.703 MiB 0.715 MiB for i in range(parts-1):
16 48.703 MiB 0.000 MiB file = next(output_files)
17 48.703 MiB 0.000 MiB with open(file, "w") as OF:
18 48.703 MiB 0.000 MiB for _ in range(st, ed):
19 48.703 MiB 0.000 MiB OF.writelines(input_file.readline())
20
21 48.703 MiB 0.000 MiB st = ed
22 48.703 MiB 0.000 MiB ed = st + p
23 48.703 MiB 0.000 MiB if num_lines_orig - ed < p:
24 48.703 MiB 0.000 MiB ed = st + (num_lines_orig - ed) + p
25 else:
26 48.703 MiB 0.000 MiB ed = st + p
27
28 48.703 MiB 0.000 MiB file = next(output_files)
29 48.703 MiB 0.000 MiB with open(file, "w") as OF:
30 48.703 MiB 0.000 MiB for _ in range(st, ed):
31 48.703 MiB 0.000 MiB OF.writelines(input_file.readline())



real 1m48.071s
user 1m13.144s
sys 0m19.652s

在没有分析器的情况下运行

$ time py36  Splitting\ text\ files_GOOD_usingGenerators.py 
Enter the full path of file you want to split into smaller inputFiles: /apps/nttech/rbhanot/Downloads/test.txt
Enter the number of parts you want to split in: 3

real 0m10.429s
user 0m3.160s
sys 0m0.016s

那么,为什么分析首先会使我的代码变慢?其次,如果分析影响执行速度,那么为什么这种影响没有在使用列表的代码版本上显示。

最佳答案

我使用 line_profiler 对代码进行了 cpu_profiled,这次我得到了答案,生成器版本花费更多时间的原因是因为以下几行

19         2      11126.0   5563.0      0.2          with open(file, "w") as OF:
20 379886 200418.0 0.5 3.0 for _ in range(st, ed):
21 379884 2348653.0 6.2 35.1 OF.writelines(input_file.readline())

为什么列表版本不会变慢是因为

   19         2       9419.0   4709.5      0.4          with open(output_files[i], "w") as OF:
20 2 1654165.0 827082.5 65.1 OF.writelines(input_file[st:ed])

对于列表,新文件是通过简单地通过切片获取列表的副本来编写的,这实际上是一个单一的语句。然而对于生成器版本,新文件是通过逐行读取输入文件来填充的,这使得内存分析器对每一行进行分析,这相当于增加了 cpu 时间。

关于python - 使用 memory_profiler 分析代码会增加执行时间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50627267/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com