["a", "b", "c", "d"] "a\tb c\nd".split(' ') => ["a-6ren">
gpt4 book ai didi

ruby - 为什么 split (' ' ) 试图变得(太)聪明?

转载 作者:数据小太阳 更新时间:2023-10-29 07:06:11 30 4
gpt4 key购买 nike

我刚刚发现 String#split 有以下奇怪的行为:

"a\tb c\nd".split
=> ["a", "b", "c", "d"]

"a\tb c\nd".split(' ')
=> ["a", "b", "c", "d"]

"a\tb c\nd".split(/ /)
=> ["a\tb", "c\nd"]

The source (来自 2.0.0 的 string.c)超过 200 行,包含这样一段话:

/* L 5909 */
else if (rb_enc_asciicompat(enc2) == 1) {
if (RSTRING_LEN(spat) == 1 && RSTRING_PTR(spat)[0] == ' '){
split_type = awk;
}
}

后来,在 awk split 类型的代码中,实际参数甚至不再使用,与普通的 split 相同。

  • 有没有其他人觉得这有什么问题?
  • 这样做有充分的理由吗?
  • 在 Ruby 中是否比大多数人想象的更经常发生这种“魔法”?

最佳答案

这与 Perl 的 split() 行为一致。这又基于 Gnu awk's 拆分()。所以这是一个起源于 Unix 的长期传统。

来自perldocsplit 上:

As another special case, split emulates the default behavior of the command line tool awk when the PATTERN is either omitted or a literal string composed of a single space character (such as ' ' or "\x20" , but not e.g. / / ). In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were /\s+/ ; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator. However, this special treatment can be avoided by specifying the pattern / / instead of the string " " , thereby allowing only a single space character to be a separator.

关于ruby - 为什么 split (' ' ) 试图变得(太)聪明?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16301372/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com