gpt4 book ai didi

java - 正则表达式组成

转载 作者:行者123 更新时间:2023-11-29 07:14:05 24 4
gpt4 key购买 nike

我想解析 CSV(逗号分隔)文件中的一行,如下所示:

Bosh,Mark,mark@gmail.com,"3, Institute","83, 1, 2",1,21

我必须解析文件,而不是我想要的撇号之间的逗号 ';',就像这样:

Bosh,Mark,mark@gmail.com,"3; Institute","83; 1; 2",1,21

我使用了以下 Java 代码,但它无法很好地解析它:

Pattern regex = Pattern.compile("(\"[^\\]]*\")");
Matcher matcher = regex.matcher(line);
if (matcher.find()) {
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}

输出是:

Bosh,Mark,mark@gmail.com,"3; Institute";"83; 1; 2",1,21

有人知道如何解决这个问题吗?

最佳答案

这是我将引号内的 , 替换为 ; 的解决方案。它假设,如果 " 出现在带引号的字符串中,那么它会被另一个 " 转义。此属性确保从开始计数到当前字符,如果引号 " 的数量是奇数,则该字符在带引号的字符串内。

// Test string, with the tricky case """", which resolves to
// a length 1 string of single quote "
String line = "Bosh,\"\"\"\",mark@gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";

Pattern pattern = Pattern.compile("\"[^\"]*\"");
Matcher matcher = pattern.matcher(line);

int start = 0;

StringBuilder output = new StringBuilder();

while (matcher.find()) {
// System.out.println(m.group() + "\n " + m.start() + " " + m.end());
output
.append(line.substring(start, matcher.start())) // Append unrelated contents
.append(matcher.group().replaceAll(",", ";")); // Append replaced string

start = matcher.end();
}
output.append(line.substring(start)); // Append the rest of unrelated contents

// System.out.println(output);

尽管我找不到任何会像您在 line = line.replace(matcher.group(), replacedMatch); 中那样替换匹配组的方法失败的情况,但我觉得更安全从头开始重建字符串。

关于java - 正则表达式组成,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11259594/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com