gpt4 book ai didi

regex - 如何在 Ocaml 中拆分空格?

转载 作者:行者123 更新时间:2023-12-04 00:06:45 28 4
gpt4 key购买 nike

空格是空格、制表符或换行符(即回车或换行符)

我假设 \s封面 , \t , \n , \r , 和 \f
但是当我尝试使用 \s 时它无法正确拆分字符串:

# let line1 = "We the People of the United States, in Order to form a more perfect";;

# let wsp_regex = Str.regexp "\\s+";;
# let words = Str.split wsp_regex line1;;
val words : string list =
["We the People of the United State"; ", in Order to form a more perfect"]

# let wsp_regex = Str.regexp "[ \\s]+";;
# let words = Str.split wsp_regex line1;;
val words : string list =
["We"; "the"; "People"; "of"; "the"; "United"; "State"; ","; "in"; "Order"; "to"; "form"; "a"; "more"; "perfect"]

# let wsp_regex = Str.regexp "[\\s]+";;
# let words = Str.split wsp_regex line1;;
val words : string list =
["We the People of the United State"; ", in Order to form a more perfect"]

# let wsp_regex = Str.regexp "[ \\s\\t\\n\\r]+";;
# let words = Str.split wsp_regex line1;;
val words : string list =
["We"; "he"; "People"; "of"; "he"; "U"; "i"; "ed"; "S"; "a"; "e"; ","; "i"; "O"; "de"; "o"; "fo"; "m"; "a"; "mo"; "e"; "pe"; "fec"]

# let wsp_regex = Str.regexp "[\s]+";;
Characters 29-31:
Warning 14: illegal backslash escape in string.
val wsp_regex : Str.regexp = <abstr>

# let words = Str.split wsp_regex line1;;
val words : string list =
["We the People of the United State"; ", in Order to form a more perfect"]

# let wsp_regex = Str.regexp "[ \s]+";;
Characters 30-32:
Warning 14: illegal backslash escape in string.
val wsp_regex : Str.regexp = <abstr>
# let words = Str.split wsp_regex line1;;
val words : string list =
["We"; "the"; "People"; "of"; "the"; "United"; "State"; ","; "in"; "Order"; "to"; "form"; "a"; "more"; "perfect"]

# let wsp_regex = Str.regexp "[ \t\n\r\f]+";;
Characters 36-38:
Warning 14: illegal backslash escape in string.
val wsp_regex : Str.regexp = <abstr>
# let words = Str.split wsp_regex line1;;
val words : string list =
["We"; "the"; "People"; "o"; "the"; "United"; "States,"; "in"; "Order"; "to"; "orm"; "a"; "more"; "per"; "ect"]

# let wsp_regex = Str.regexp "[\t\n\r\f]+";;
Characters 35-37:
Warning 14: illegal backslash escape in string.
val wsp_regex : Str.regexp = <abstr>
# let words = Str.split wsp_regex line1;;
val words : string list =
["We the People o"; " the United States, in Order to "; "orm a more per"; "ect"]

似乎有效的唯一案例是:
# let wsp_regex = Str.regexp "[ ]+";;
# let words = Str.split wsp_regex line1;;
val words : string list =
["We"; "the"; "People"; "of"; "the"; "United"; "States,"; "in"; "Order"; "to"; "form"; "a"; "more"; "perfect"]

# let wsp_regex = Str.regexp "[ \t\n\r]+";;
# let words = Str.split wsp_regex line1;;
val words : string list =
["We"; "the"; "People"; "of"; "the"; "United"; "States,"; "in"; "Order"; "to"; "form"; "a"; "more"; "perfect"]

我不确定为什么第二种情况有效,因为做 [ \s]+没有用(Ocaml 认为我想在 s 上拆分)

我想要的只是在不使用 的情况下在空白处进行拆分因为我也想捕获 \t , \n , \r , 和 \f .

但是我似乎无法弄清楚如何在 Ocaml 中创建一个正则表达式来分割空白。

如果有人能为我提供一个工作表达,将不胜感激!

最佳答案

Str module 的文档中你会发现\s不支持。因此,您的第一个表达式将分隔字符序列 s 上的单词。 .事实上,这就是你所看到的。

没有其他尝试 \s开始工作,因为 \s不支持。

令人惊讶的是,甚至 \n (两个字符的表示法)不支持作为正则表达式。因此,如果您想匹配换行符,则您的正则表达式模式中需要一个实际的换行符。换句话说,您希望字符串具有以下内容:"\n" ,不是这个:"\\n" . \r也是如此和 \t .

书写方式\f不被 OCaml 字符串语法接受。如果要匹配换页,则需要使用其十六进制表示法 \x0c .

把这些放在一起,你的模式应该是这样的:"[ \n\r\x0c\t]+" .

# Str.split (Str.regexp "[ \n\r\x0c\t]+") line1;;
- : string list =
["We"; "the"; "People"; "of"; "the"; "United"; "States,"; "in";
"Order"; "to"; "form"; "a"; "more"; "perfect"]

有一个 Perl 兼容的正则表达式包,你可能会觉得使用起来更舒服: https://opam.ocaml.org/packages/pcre/pcre.7.1.5/

关于regex - 如何在 Ocaml 中拆分空格?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39813584/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com