gpt4 book ai didi

performance - Erlang 二进制文件以降低大小写性能

转载 作者:行者123 更新时间:2023-12-03 09:32:36 24 4
gpt4 key购买 nike

我的目标是加快仅 ASCII 二进制转换为小写的性能。除了英语,我不需要任何语言。我已经编写并比较了一些变体:

二进制理解:

binary_comprehension(Binary) ->
<< <<if
C >= $A andalso C =< $Z -> C - $A + $a;
true -> C
end >>
|| <<C>> <= Binary >>.

列表理解:
list_comprehension(Binary) ->
L = binary_to_list(Binary),
Lower =
[if
C >= $A andalso C =< $Z -> C - $A + $a;
true -> C
end || C <- L],
list_to_binary(Lower).

和常规字符串:小写。

令人惊讶的是,列表理解击败了所有其他人:
1> timer:tc(fun() -> lists:foreach(fun(_) -> tolower:list_comprehension(<<"QWEQWEIQEKQHWKEHKQWHEKQHWKEQWEKHQWLKL">>) end, L100000) end).
{267603,ok}

2> timer:tc(fun() -> lists:foreach(fun(_) -> tolower:binary_comprehension(<<"QWEQWEIQEKQHWKEHKQWHEKQHWKEQWEKHQWLKL">>) end, L100000) end).
{324383,ok}

3> timer:tc(fun() -> lists:foreach(fun(_) -> string:lowercase(<<"QWEQWEIQEKQHWKEHKQWHEKQHWKEQWEKHQWLKL">>) end, L100000) end).
{319819,ok}

任何想法为什么双列表转换+理解比二进制转换快得多?

也许你知道更强大的优化?

更新:

我还发现字符串的 list-of-char 版本也很快:
string_lowercase(Binary) ->
L = binary_to_list(Binary),
Lower = string:lowercase(L),
list_to_binary(Lower).

跑:
39> timer:tc(fun() -> lists:foreach(fun(_) -> tolower:string_to_lower(<<"QWEQWEIQEKQHWKEHKQWHEKQHWKEQWEKHQWLKL">>) end, L100000) end).
{277766,ok}

最佳答案

我对代码进行了一些修改并更改了测试用例。测试更改不是强制性的,但我个人更喜欢这种方式:

-module(tolower).
-compile(export_all).

u2l(C) when C >= $A andalso C =< $Z -> C + 32;
u2l(C) -> C.

binary_comprehension(Binary) ->
<< << (u2l(C)) >> || <<C>> <= Binary >>.

list_comprehension(Binary) ->
list_to_binary([u2l(C) || C <- binary_to_list(Binary)]).

list_recur(Binary) -> list_recur(binary_to_list(Binary), []).

list_recur([], Result) -> lists:reverse(Result);
list_recur([C | Tail], Result) when C >= $A andalso C =< $Z ->
list_recur(Tail, [(C + 32) | Result]);
list_recur([C | Tail], Result) ->
list_recur(Tail, [C | Result]).

string_to_lower(Binary) ->
list_to_binary(string:lowercase(binary_to_list(Binary))).

test() ->
L100000 = lists:seq(1, 100000),
TL0 = <<"QWEQWEIQEKQHWKEHKQWHEKQHWKEQWEKHQWLKL">>,
TL = binary:copy(TL0, 100000),
{R0, _} = timer:tc(fun() -> lists:foreach(fun(_) -> tolower:binary_comprehension(TL0) end, L100000) end),
{R1, _} = timer:tc(tolower, binary_comprehension, [TL]),
{R2, _} = timer:tc(tolower, list_comprehension, [TL]),
{R3, _} = timer:tc(tolower, list_recur, [TL]),
{R4, _} = timer:tc(string, lowercase, [TL]),
{R5, _} = timer:tc(tolower, string_to_lower, [TL]),
io:format("~n1.binary_comprehension = ~10w~n2.binary_comprehension = ~10w~n3. list_comprehension = ~10w~n4. list_recur = ~10w~n5. lowercase = ~10w~n6. string_to_lower = ~10w~n",
[R0,R1,R2,R3,R4,R5]).

Erlang shell 显示由于系统的并发特性,运行时间不一致。但最佳时间是如预期的 binary_comprehension。
62> c(tolower).    
tolower.erl:2: Warning: export_all flag enabled - all functions will be exported
{ok,tolower}
63> l(tolower).
{module,tolower}
64> tolower:test().

1.binary_comprehension = 109000
2.binary_comprehension = 94000
3. list_comprehension = 312001
4. list_recur = 344001
5. lowercase = 469002
6. string_to_lower = 218000
ok
65> tolower:test().

1.binary_comprehension = 140998
2.binary_comprehension = 93999
3. list_comprehension = 327994
4. list_recur = 296996
5. lowercase = 155997
6. string_to_lower = 280996
ok
66> tolower:test().

1.binary_comprehension = 124998
2.binary_comprehension = 93998
3. list_comprehension = 327995
4. list_recur = 296995
5. lowercase = 452993
6. string_to_lower = 202997
ok
67> tolower:test().

1.binary_comprehension = 125000
2.binary_comprehension = 94000
3. list_comprehension = 312000
4. list_recur = 282000
5. lowercase = 171000
6. string_to_lower = 266000
ok

第 5 行的时间与第 6 行的时间不同,因为当您使用二进制参数调用 string:lowercase/1 时,它被处理为 utf8 序列。当您使用字符串参数调用 string:lowercase/1 时,避免了 utf8 处理。有关详细信息,请参阅 OTP 中 string.erl 的代码。

关于performance - Erlang 二进制文件以降低大小写性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54734208/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com