gpt4 book ai didi

java - opencsv:如何解析单元格内带双引号的数据?

转载 作者:行者123 更新时间:2023-11-30 06:31:16 29 4
gpt4 key购买 nike

我正在尝试使用 opencsv 解析一些公共(public)数据(3.10 版)。下面是获取 CSV 并将记录映射到 POJO 列表的代码片段:

URL permitsURL = new URL("http://assessor.boco.solutions/ASR_PublicDataFiles/Permits.csv");
InputStream permitInputStream = permitsURL.openStream();
Reader permitStreamReader = new InputStreamReader(permitInputStream);

CsvToBean<PermitRecord> csvToBean = new CsvToBean<PermitRecord>();

Map<String, String> columnMapping = new HashMap<String, String>();
columnMapping.put("strap", "strap");
columnMapping.put("issued_by", "issuedBy");
columnMapping.put("permit_num", "permitNum");
columnMapping.put("permit_category", "permitCategory");
columnMapping.put("issue_dt", "issueDt");
columnMapping.put("estimated_value", "estimatedValue");
columnMapping.put("description", "description");

HeaderColumnNameTranslateMappingStrategy<PermitRecord> strategy = new HeaderColumnNameTranslateMappingStrategy<PermitRecord>();
strategy.setType(PermitRecord.class);
strategy.setColumnMapping(columnMapping);

List<PermitRecord> permitRecordList = null;

CSVReader csvReader = new CSVReader(permitStreamReader);
permitRecordList = csvToBean.parse(strategy, csvReader);

解析列表中的记录少于 CSV 中的记录。查看数据,我注意到单元格值中有时有双引号。这是一个例子:

"R0601364                 ","LAFAYETTE","14-0486","DECK","4/29/2014 12:00:00 AM","3834","deck under 36\"""
"R0601365 ","LAFAYETTE","13-0570","NEW CONSTRUCTION","5/22/2013 12:00:00 AM","121899","SIN FAMILY HOME PLN CUSTOM FIN BASEMENT"

低于 36"的牌组导致后续记录被卷入描述中。通过 IDE 查看时,这一点更加明显:

permit record IDE screenshot

你能看出我做错了什么吗?我怀疑有一个简单的修复方法,因为 Excel 可以正确解析它,并且 opencsv 似乎是 Java CSV 解析的事实上的标准。

最佳答案

Univocity CSV parsers真的很容易使用。将 CSV 列映射到 POJO 属性非常简单。

我向 pom.xml 添加了以下依赖项:

<dependency>
<groupId>com.univocity</groupId>
<artifactId>univocity-parsers</artifactId>
<version>2.5.4</version>
</dependency>

CSV 列使用注释映射到属性。请注意方便的注释:

  • Parsed(field = "abc"):将 CSV 列映射到变量
  • @Trim:删除前导/尾随空格
  • @Format(formats = {"MM/dd/yyyy"}):允许我们指定日期格式

这是 POJO:

package io.woolford.entity;

import com.univocity.parsers.annotations.Format;
import com.univocity.parsers.annotations.Parsed;
import com.univocity.parsers.annotations.Trim;
import java.util.Date;

public class PermitRecord {

@Trim
@Parsed(field = "strap")
private String strap;

@Parsed(field = "issued_by")
private String issuedBy;

@Parsed(field = "permit_num")
private String permitNum;

@Parsed(field = "permit_category")
private String permitCategory;

@Format(formats = {"MM/dd/yyyy"})
@Parsed(field = "issue_dt")
private Date issueDt;

@Parsed(field = "estimated_value")
private Integer estimatedValue;

@Parsed(field = "description")
private String description;

// getters & setters removed for brevity
}

然后,根据 CSV 文件中的记录创建 POJO 列表:

URL permitsURL = new URL("http://assessor.boco.solutions/ASR_PublicDataFiles/Permits.csv");
InputStream permitInputStream = permitsURL.openStream();
List<PermitRecord> permitRecordList = new CsvRoutines().parseAll(PermitRecord.class, permitInputStream);

感谢 @JeronimoBackes 提供的这个优雅的解决方案。并感谢 Univocity 出色的 CSV 解析器。

关于java - opencsv:如何解析单元格内带双引号的数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46060218/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com