gpt4 book ai didi

java - StreamTokenizer 破坏整数和松散的句点

转载 作者:行者123 更新时间:2023-11-30 10:30:58 24 4
gpt4 key购买 nike

我已经挪用并修改了以下代码,这些代码在使用 Java 的 StreamTokenizer 对 Java 代码进行标记化方面做得非常好。但是,它的数字处理是有问题的:

  1. 它将所有整数转换为 double 。我可以通过测试 num % 1 == 0 来解决这个问题,但这感觉像是 hack
  2. 更重要的是,一个 .以下空格被视为数字。 "Class .method()"是合法的 Java 语法,但生成的标记是 [Word "Class"]、[Whitespace ""]、[Number 0.0]、[Word "method"]、[Symbol "("] 和[符号“)”]

我很乐意完全关闭 StreamTokenizer 的数字解析并自己从单词标记中解析数字,但评论 st.parseNumbers() 似乎没有效果。

public class JavaTokenizer {

private String code;

private List<Token> tokens;

public JavaTokenizer(String c) {
code = c;
tokens = new ArrayList<>();
}

public void tokenize() {
try {
// Create the tokenizer
StringReader sr = new StringReader(code);
StreamTokenizer st = new StreamTokenizer(sr);

// Java-style tokenizing rules
st.parseNumbers();
st.wordChars('_', '_');
st.eolIsSignificant(false);

// Don't want whitespace tokens
//st.ordinaryChars(0, ' ');

// Strip out comments
st.slashSlashComments(true);
st.slashStarComments(true);

// Parse the file
int token;
do {
token = st.nextToken();
switch (token) {
case StreamTokenizer.TT_NUMBER:
// A number was found; the value is in nval
double num = st.nval;
if(num % 1 == 0)
tokens.add(new IntegerToken((int)num);
else
tokens.add(new FPNumberToken(num));
break;
case StreamTokenizer.TT_WORD:
// A word was found; the value is in sval
String word = st.sval;
tokens.add(new WordToken(word));
break;
case '"':
// A double-quoted string was found; sval contains the contents
String dquoteVal = st.sval;
tokens.add(new DoubleQuotedStringToken(dquoteVal));
break;
case '\'':
// A single-quoted string was found; sval contains the contents
String squoteVal = st.sval;
tokens.add(new SingleQuotedStringToken(squoteVal));
break;
case StreamTokenizer.TT_EOL:
// End of line character found
tokens.add(new EOLToken());
break;
case StreamTokenizer.TT_EOF:
// End of file has been reached
tokens. add(new EOFToken());
break;
default:
// A regular character was found; the value is the token itself
char ch = (char) st.ttype;
if(Character.isWhitespace(ch))
tokens.add(new WhitespaceToken(ch));
else
tokens.add(new SymbolToken(ch));
break;
}
} while (token != StreamTokenizer.TT_EOF);
sr.close();
} catch (IOException e) {
}
}

public List<Token> getTokens() {
return tokens;
}

}

最佳答案

parseNumbers() 默认处于“开启”状态。使用 resetSyntax() 关闭数字解析和所有其他预定义的字符类型,然后启用您需要的。

也就是说,手动数字解析在计算点和指数时可能会变得棘手……使用扫描仪和正则表达式,实现您自己的分词器应该相对简单,完全根据您的需要量身定制。例如,您可能想看看这里的 Tokenizer 内部类:https://github.com/stefanhaustein/expressionparser/blob/master/core/src/main/java/org/kobjects/expressionparser/ExpressionParser.java (最后大约120行)

关于java - StreamTokenizer 破坏整数和松散的句点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43502608/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com