gpt4 book ai didi

Java 标记化 : Treat Anything Separated by an Underscore as One Word

转载 作者:行者123 更新时间:2023-11-29 05:17:21 25 4
gpt4 key购买 nike

我有一个使用 StreamTokenizer 的非常简单的分词器,它将数学表达式转换为它们的各个组件(如下)。我遇到的问题是,如果表达式中有一个名为 T_1 的变量,它将拆分为 [T,_,1],我想将其作为 [T_1] 返回。

我曾尝试使用一个变量来检查最后一个字符是否是下划线,如果是,则将下划线附加到 list.Size-1 上,但这似乎是一个非常笨拙且效率低下的解决方案。有没有办法做到这一点?谢谢!

        StreamTokenizer tokenizer = new StreamTokenizer(new StringReader(s));
tokenizer.ordinaryChar('-'); // Don't parse minus as part of numbers.
tokenizer.ordinaryChar('/'); // Don't parse slash as part of numbers.
List<String> tokBuf = new ArrayList<String>();
while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) //While not the end of file
{
switch (tokenizer.ttype) //Switch based on the type of token
{
case StreamTokenizer.TT_NUMBER: //Number
tokBuf.add(String.valueOf(tokenizer.nval));
break;
case StreamTokenizer.TT_WORD: //Word
tokBuf.add(tokenizer.sval);
break;
case '_':
tokBuf.add(tokBuf.size()-1, tokenizer.sval);
break;
default: //Operator
tokBuf.add(String.valueOf((char) tokenizer.ttype));
}
}

return tokBuf;

最佳答案

这就是你想要的。

tokenizer.wordChars('_', '_');

这使得 _ 可以识别为单词的一部分。

附录:

构建并运行:

public static void main(String args[]) throws Exception {
String s = "abc_xyz abc 123 1 + 1";
StreamTokenizer tokenizer = new StreamTokenizer(new StringReader(s));
tokenizer.ordinaryChar('-'); // Don't parse minus as part of numbers.
tokenizer.ordinaryChar('/'); // Don't parse slash as part of numbers.
tokenizer.wordChars('_', '_'); // Don't parse slash as part of numbers.


List<String> tokBuf = new ArrayList<String>();
while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) //While not the end of file
{
switch (tokenizer.ttype) //Switch based on the type of token
{
case StreamTokenizer.TT_NUMBER: //Number
tokBuf.add(String.valueOf(tokenizer.nval));
break;
case StreamTokenizer.TT_WORD: //Word
tokBuf.add(tokenizer.sval);
break;
default: //Operator
tokBuf.add(String.valueOf((char) tokenizer.ttype));
}
}
System.out.println(tokBuf);
}

run:
[abc_xyz, abc, 123.0, 1.0, +, 1.0]

关于Java 标记化 : Treat Anything Separated by an Underscore as One Word,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26065390/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com