gpt4 book ai didi

java - 在 Java 中解析 CSV,仅当内容包含逗号时才应用文本限定符

转载 作者:行者123 更新时间:2023-11-30 03:00:02 29 4
gpt4 key购买 nike

我有一个 CSV 文件,其内容如下:

1,"hello, there",I have a csv in which,"only when ""double quote"" or comma are there in the content",it will be wrapped in the double quotes,otherwise not,something like 1/2" will not be wrapped up in double quotes.

我使用 OpenCSV 和其他 CSV 库进行解析,但它不起作用。

我使用了 StackOverflow question 中引用的正则表达式但它也不起作用。

但是,当我在 Excel 中打开它时,它工作正常。有人可以给我有关如何解析此 CSV 文件的提示吗?

请注意,当内容包含逗号时,仅将其包含在文本限定符中。当此类内容用双引号括起来,并且双引号是内容的一部分时,则用双引号对其进行转义。换句话说,它变成了双双引号。但如果内容有双引号,则它不会包含在文本限定符中。

请就此提出建议。

上述内容解析后的输出应如下:

输出应如下所示:

1
hello, there
I have a csv in which
only whn "double quote" or comma are there in the content
it will be wrapped in the double quotes
otherwise not
something like 1/2" will not be wrapped up in double quotes.

我尝试使用 open csv 并尝试使用正则表达式进行分割:

",(?=([^\"]*\"[^\"]*\")*[^\"]*$)"

但是没有用。

我的数据如下:

PRODUCT,,1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVE,P,2510906459,,DEWALT TOOLS,,,<br><img src="http://example.com/image.png"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED  - EQUAL TO 115-DCD960KL,

希望按如下方式解析(当我们在 Excel 中看到它时,我曾经表示一个空单元格)

PRODUCT
<BLANK>
1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVE
P
2510906459
<BLANK>
DEWALT TOOLS
<BLANK>
<BLANK>
<br><img src="http://example.com/image.png"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED - EQUAL TO 115-DCD960KL

最佳答案

我用 uniVocity-parsers 解析您的输入没有任何问题:

    String input = "PRODUCT,,1/2\" 18V CORDLESS XRP LI-LON DRILL/DRIVE,P,2510906459,,DEWALT TOOLS,,,<br><img src=\"http://example.com/image.png\"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2\" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED  - EQUAL TO 115-DCD960KL,";
Reader reader = new StringReader(input);

CsvParserSettings settings = new CsvParserSettings(); //many options here, check the tutorial.
settings.setNullValue("<BLANK>"); //use that to obtain <BLANK> to represent nulls

String[] row = new CsvParser(settings).parseAll(reader).get(0);
for(String element : row){
System.out.println(element);
}

输出:

PRODUCT
<BLANK>
1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVE
P
2510906459
<BLANK>
DEWALT TOOLS
<BLANK>
<BLANK>
<br><img src="http://example.com/image.png"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED - EQUAL TO 115-DCD960KL
<BLANK>

免责声明:我是这个库的作者,它是开源且免费的(Apache 2.0 许可证)

关于java - 在 Java 中解析 CSV,仅当内容包含逗号时才应用文本限定符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36235178/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com