gpt4 book ai didi

regex - 如何正确衡量正则表达式的性能?

转载 作者:行者123 更新时间:2023-12-02 14:52:36 25 4
gpt4 key购买 nike

尝试一些正则表达式性能测试(听到一些谣言说 erlang 很慢)

>Fun = fun F(X) -> case X > 1000000 of true -> ok; false -> Y = X + 1, re:run(<<"1ab1jgjggghjgjgjhhhhhhhhhhhhhjgdfgfdgdfgdfgdfgdfgdfgdfgdfgdfgfgv">>, "^[a-zA-Z0-9_]+$"), F(Y) end end.
#Fun<erl_eval.30.128620087>
> timer:tc(Fun, [0]).
{17233982,ok}
> timer:tc(Fun, [0]).
{17155982,ok}

编译正则表达式后的一些测试

{ok, MP} = re:compile("^[a-zA-Z0-9_]+$").                                   
{ok,{re_pattern,0,0,0,
<<69,82,67,80,107,0,0,0,16,0,0,0,1,0,0,0,255,255,255,
255,255,255,...>>}}
> Fun = fun F(X) -> case X > 1000000 of true -> ok; false -> Y = X + 1, re:run(<<"1ab1jgjggghjgjgjhhhhhhhhhhhhhjgdfgfdgdfgdfgdfgdfgdfgdfgdfgdfgfgv">>, MP), F(Y) end end.
#Fun<erl_eval.30.128620087>
> timer:tc(Fun, [0]).
{15796985,ok}
>
> timer:tc(Fun, [0]).
{15921984,ok}

http://erlang.org/doc/man/timer.html :

Unless otherwise stated, time is always measured in milliseconds.

http://erlang.org/doc/man/re.html#compile-1 :

Compiling the regular expression before matching is useful if the same expression is to be used in matching against multiple subjects during the lifetime of the program. Compiling once and executing many times is far more efficient than compiling each time one wants to match.

问题

  1. 为什么它返回微秒给我?(应该是毫秒?)
  2. 编译正则表达式没有太大区别,为什么?
  3. 我应该编译它吗?

最佳答案

  1. 在模块中timer , 函数 tc/2 返回微秒
tc(Fun) -> {Time, Value}
tc(Fun, Arguments) -> {Time, Value}
tc(Module, Function, Arguments) -> {Time, Value}
Types
Module = module()
Function = atom()
Arguments = [term()]
Time = integer()
In microseconds
Value = term()
  1. 因为函数 Fun 需要编译字符串 "^[a-zA-Z0-9_]+$" 每次递归(100 万次)以防万一1. 相比之下,情况2先编译,然后将结果带入递归,这就是性能低于情况1的原因。

run(Subject, RE) -> {match, Captured} | nomatch

Subject = iodata() | unicode:charlist()

RE = mp() | iodata()

The regular expression can be specified either as iodata() in which case it is automatically compiled (as by compile/2) and executed, or as a precompiled mp() in which case it is executed against the subject directly.

  1. 是的,你应该注意先编译再递归

关于regex - 如何正确衡量正则表达式的性能?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54833051/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com