gpt4 book ai didi

java - 解析 csv,不要在单引号或双引号内分割

转载 作者:搜寻专家 更新时间:2023-11-01 02:40:34 27 4
gpt4 key购买 nike

我尝试用 java 解析 csv 并遇到以下问题:第二列是用双引号括起来的字符串(也可能包含逗号),除非字符串本身包含双引号,否则整个字符串用单引号括起来。例如

行可能像这样:

someStuff,"hello", someStuff
someStuff,"hello, SO", someStuff
someStuff,'say "hello, world"', someStuff
someStuff,'say "hello, world', someStuff

someStuff 是其他元素的占位符,也可以包含相同样式的引号

我正在寻找一种以逗号分隔行的通用方法,除非用单引号或双引号括起来,以便将第二列作为字符串。第二列是指字段:

  • 你好
  • 你好,SO
  • 说“你好,世界”
  • 说“你好,世界

我尝试了 OpenCSV 但失败了,因为只能指定一种类型的引号:

public class CSVDemo {

public static void main(String[] args) throws IOException {
CSVDemo demo = new CSVDemo();
demo.process("input.csv");
}

public void process(String fileName) throws IOException {
String file = this.getClass().getClassLoader().getResource(fileName)
.getFile();
CSVReader reader = new CSVReader(new FileReader(file));
String[] nextLine;
while ((nextLine = reader.readNext()) != null) {
System.out.println(nextLine[0] + " | " + nextLine[1] + " | "
+ nextLine[2]);
}
}

opencsv 的解决方案在最后一行失败,其中只有一个双引号括在单引号中:

someStuff | hello |  someStuff
someStuff | hello, SO | someStuff
someStuff | 'say "hello, world"' | someStuff
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1

最佳答案

如果您确实无法使用真正的 CSV 解析器,则可以使用正则表达式。这通常不是一个好主意,因为总是存在您无法处理的边缘情况,但如果格式严格按照您的描述进行,那么这可能会奏效。

public void test() {
String[] tests = {"numeStuff,\"hello\", someStuff, someStuff",
"numeStuff,\"hello, SO\", someStuff, someStuff",
"numeStuff,'say \"hello, world\"', someStuff, someStuff"
};
/* Matches a field and a potentially empty separator.
*
* ( - Field Group
* \" - Start with a quote
* [^\"]*? - Non-greedy match on anything that is not a quote
* \" - End with a quote
* | - Or
* ' - Start with a strop
* [^']*? - Non-greedy match on anything that is not a strop
* ' - End with a strop
* | - Or
* [^\"'] - Not starting with a quote or strop
* [^,$]*? - Non-greedy match on anything that is not a comma or end-of-line
* ) - End field group
* ( - Separator group
* [,$] - Comma separator or end of line
* ) - End separator group
*/
Pattern p = Pattern.compile("(\"[^\"]*?\"|'[^\']*?\'|[^\"'][^,\r\n]*?)([,\r\n]|$)");
for (String t : tests) {
System.out.println("Matching: " + t);
Matcher m = p.matcher(t);
while (m.find()) {
System.out.println(m.group(1));
}
}
}

关于java - 解析 csv,不要在单引号或双引号内分割,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34293742/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com