gpt4 book ai didi

java - 正则表达式保留引号、单引号、连字符并在空格处分割

转载 作者:行者123 更新时间:2023-12-02 00:18:37 29 4
gpt4 key购买 nike

我使用 Java Pattern 类将正则表达式指定为字符串。

举个例子我喜欢成为蜘蛛侠:“彼得·帕克”

应将蜘蛛侠和“Peter Parker”列为单独的标记。谢谢

try {
BufferedReader br = new BufferedReader(new FileReader(f));
StringBuilder sb = new StringBuilder();
String line = br.readLine();

while (line != null) {
sb.append(line);
line = br.readLine();
}

String everything = sb.toString();
List<String> result = new ArrayList<String>();
Pattern pat = Pattern.compile("([\"'].*?[\"']|[^ ]+)");
PatternTokenizer pt = new PatternTokenizer(new StringReader(everything),pat,0);
while (pt.incrementToken()) {
result.add(pt.getAttribute(CharTermAttribute.class).toString());

}

}
catch (Exception e) {
throw new RuntimeException(e);
}

所以我猜“某个单词”不起作用的原因是因为每个标记本身就是一个字符串。有什么线索吗?谢谢

最佳答案

如果它不必是正则表达式,并且 String 中的数据是正确的(引号的顺序正确,不像 "' some data "'),那么您可以在 中执行此操作一次迭代就像

String data="I love being spider-man : \"Peter Parker\" or 'photo reporter'";

List<String> tokens = new ArrayList<String>();
StringBuilder sb=new StringBuilder();
boolean inSingleQuote=false;
boolean indDoubleQuote=false;

for (char c:data.toCharArray()){
if (c=='\'') inSingleQuote=!inSingleQuote;
if (c=='"') indDoubleQuote=!indDoubleQuote;
if (c==' ' && !inSingleQuote && !indDoubleQuote){
tokens.add(sb.toString());
sb.delete(0,sb.length());
}
else
sb.append(c);
}
tokens.add(sb.toString());
System.out.println(tokens);

输出

[I, love, being, spider-man, :, "Peter Parker", or, 'photo reporter']

关于java - 正则表达式保留引号、单引号、连字符并在空格处分割,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11443416/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com