gpt4 book ai didi

java - 我是否用这个 token 替换代码重新发明了轮子?

转载 作者:太空宇宙 更新时间:2023-11-04 10:13:34 27 4
gpt4 key购买 nike

我有一个用例,其中有一行包含嵌套标记的文本(例如 {}),并且我希望转换嵌套在特定深度的某些子字符串。

示例,在深度 1 处将单词 moo 大写:

moo [moo [moo moo]] moo ->

moo [MOO [moo moo]] moo

实现者:

replaceTokens(input, 1, "[", "]", "moo", String::toUpperCase);

或者现实世界的示例,提供尚未使用青色颜色序列着色的“--options”:

@|blue --ignoreLog|@ works, but --ignoreOutput silences everything. ->

@|blue --ignoreLog|@ works, but @|cyan --ignoreOutput|@ silences everything.

实现者:

replaceTokens(input, 0, "@|", "|@", "--\\w*", s -> format("@|cyan %s|@", s));
<小时/>

我已经实现了这个逻辑,虽然我感觉很好(可能除了性能),但我也觉得我重新发明了轮子。以下是我的实现方法:

set currentPos to zero

while (input line not fully consumed) {
take the remaining line

if the open token is matched, add to output, increase counter and advance pos accordingly
else if the close token is matched, add to output, decrease counter and advance pos accordingly
else if the counter matches provided depth and given regex matches, invoke replacer function and advance pos accordingly
else just record the next character and advance pos by 1
}

下面是实际的实现:

public static String replaceNestedTokens(String lineWithTokens, int nestingDepth, String tokenOpen, String tokenClose, String tokenRegexToReplace, Function<String, String> tokenReplacer) {
final Pattern startsWithOpen = compile(quote(tokenOpen));
final Pattern startsWithClose = compile(quote(tokenClose));
final Pattern startsWithTokenToReplace = compile(format("(?<token>%s)", tokenRegexToReplace));

final StringBuilder lineWithTokensReplaced = new StringBuilder();

int countOpenTokens = 0;
int pos = 0;

while (pos < lineWithTokens.length()) {
final String remainingLine = lineWithTokens.substring(pos);

if (startsWithOpen.matcher(remainingLine).lookingAt()) {
countOpenTokens++;
lineWithTokensReplaced.append(tokenOpen);
pos += tokenOpen.length();
} else if (startsWithClose.matcher(remainingLine).lookingAt()) {
countOpenTokens--;
lineWithTokensReplaced.append(tokenClose);
pos += tokenClose.length();
} else if (countOpenTokens == nestingDepth) {
Matcher startsWithTokenMatcher = startsWithTokenToReplace.matcher(remainingLine);
if (startsWithTokenMatcher.lookingAt()) {
String matchedToken = startsWithTokenMatcher.group("token");
lineWithTokensReplaced.append(tokenReplacer.apply(matchedToken));
pos += matchedToken.length();
} else {
lineWithTokensReplaced.append(lineWithTokens.charAt(pos++));
}
} else {
lineWithTokensReplaced.append(lineWithTokens.charAt(pos++));
}
assumeTrue(countOpenTokens >= 0, "Unbalanced token sets: closed token without open token\n\t" + lineWithTokens);
}
assumeTrue(countOpenTokens == 0, "Unbalanced token sets: open token without closed token\n\t" + lineWithTokens);
return lineWithTokensReplaced.toString();
}

我无法让它与像 this 这样的正则表达式一起工作或this (或扫描仪)解决方案,但我觉得我正在重新发明轮子,并且可以使用(普通 Java)开箱即用的类以更少的代码解决这个问题。另外,我很确定这对于所有内联模式/匹配器实例和子字符串来说都是一场性能噩梦。

建议?

最佳答案

您可以使用像 ANTLR 这样的解析器创建语法来描述您的语言或语法。然后使用监听器或访问者来制作 token 的解释器。

语法示例如下(我可以从您的代码中推断出):

grammar Expr;       
prog: (expr NEWLINE)* ;
expr: id '[' expr ']'
| '@|' expr '|@'
| '--ignoreLog' expr
| '--ignoreOutput' expr
| string
;
string: [a-zA-Z0-9];
NEWLINE : [\r\n]+ ;

关于java - 我是否用这个 token 替换代码重新发明了轮子?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52015571/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com