gpt4 book ai didi

java - 如果 ","在一个字段中,即使用“引用”,Jackson CSV 解析器也会阻塞逗号分隔值文件

转载 作者:塔克拉玛干 更新时间:2023-11-02 19:07:41 26 4
gpt4 key购买 nike

代码:

package org.javautil.salesdata;
import java.io.File;
import java.io.IOException;
import java.util.Map;

import org.javautil.util.ListOfNameValue;

import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;

// https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv
public class Manufacturers {
private static final String fileName= "src/main/resources/pdssr/manufacturers.csv";

ListOfNameValue getManufacturers() throws IOException {
ListOfNameValue lnv = new ListOfNameValue();
File csvFile = new File(fileName);
CsvMapper mapper = new CsvMapper();

CsvSchema schema = CsvSchema.emptySchema().withHeader(); // use first row as header; otherwise defaults are fine
MappingIterator<Map<String,String>> it = mapper.readerFor(Map.class)
.with(schema)
.readValues(csvFile);
while (it.hasNext()) {
Map<String,String> rowAsMap = it.next();
System.out.println(rowAsMap);
}

return lnv;

}

}

数据:

"mfr_id","mfr_cd","mfr_name"
"0000000020","F-L", "Frito-Lay"
"0000000030","GM", "General Mills"
"0000000040","HVEND", "Hershey Vending"
"0000000050","HFUND", "Hershey Fund Raising"
"0000000055","HCONC", "Hershey Concession"
"0000000060","SNYDERS", "Snyder's of Hanover"
"0000000080","KELLOGG", "Kellogg & Keebler"
"0000000115","KARS", "Kar Nut Product (Kar's)"
"0000000135","MARS", "Mars Chocolate "
"0000000145","POORE", "Inventure Group (Poore Brothers)"
"0000000150","WOW", "WOW Foods"
"0000000160","CADBURY", "Cadbury Adam USA, LLC"
"0000000170","MONOGRAM", "Monogram Food"
"0000000185","JUSTBORN", "Just Born"
"0000000190","HOSTESS", "Hostess, Dolly Madison"
"0000000210","SARALEE", "Sara Lee"

异常(exception)情况是

fasterxml.jackson.databind.exc.RuntimeJsonMappingException:条目太多:预计最多 3 个(值 #3(4 个字符)“LLC”)

我想我会放弃我自己的 CSV 解析器并采用一个具有更多功能的受支持的项目,但它们中的大多数都慢得多,只是简单的中断或者在网络上到处都有不适用于当前版本的示例产品。

最佳答案

问题是您的文件不符合 CSV 标准。第三个字段总是以空格开头

mfr_id","mfr_cd","mfr_name"
"0000000020","F-L", "Frito-Lay"
"0000000030","GM", "General Mills"
"0000000040","HVEND", "Hershey Vending"
"0000000050","HFUND", "Hershey Fund Raising"

来自 wikipedia :

根据 RFC 4180,字段中引号外的空格是不允许的;然而,RFC 还说“空格被认为是字段的一部分,不应被忽略。”和“在处理 CSV 文件时,实现者应该‘对你所做的事情保守,对你从他人那里接受的东西自由’(RFC 793,第 2.10 节)。”

Jackson 在处理您的大部分记录时“自由”;但是当它发现

"0000000160","CADBURY", "Cadbury Adam USA, LLC"

只能把is当成4个字段:

  • '0000000160'
  • '吉百利'
  • '“美国吉百利亚当”
  • '有限责任公司''

建议修复该文件,因为这将允许使用大多数 CSV 库进行解析。你可以尝试另一个图书馆,那里不缺。

关于java - 如果 ","在一个字段中,即使用“引用”,Jackson CSV 解析器也会阻塞逗号分隔值文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52239104/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com