gpt4 book ai didi

java - Apache CSV 解析器不适用于带引号的制表符分隔数据

转载 作者:行者123 更新时间:2023-12-01 09:21:47 25 4
gpt4 key购买 nike

我想解析 Google 电子书交易报告。我在 Notepad++ 中打开它以准确查看归档和记录分隔符。它是一个制表符分隔的文件,每个标题字段和数据字段都用引号引起来。CSV 文件的前两行是:

"Transaction Date" "Id"    "Product"   "Type"  "Preorder"  "Qty"   "Primary ISBN"  "Imprint Name"  "Title" "Author"    "Original List Price Currency"  "Original List Price"   "List Price Currency"   "List Price [tax inclusive]"    "List Price [tax exclusive]"    "Country of Sale"   "Publisher Revenue %"   "Publisher Revenue" "Payment Currency"  "Payment Amount"    "Currency Conversion Rate""2016. 09. 01." "ID:1166315449551685"   "Single Purchase"   "Sale"  "None"  "1" "9789633780664" "Book and Walk Kft" "Bánk bán"  "József Katona" "HUF"   "0,00"  "HUF"   "0,00"  "0,00"  "HU"    "52,0%" "0,00"  ""  ""  ""

I use the following code to parse the CSV file:

private List<Sales> parseCsv(File csv) {
Calendar max = Calendar.getInstance();
Calendar current = Calendar.getInstance();
boolean firstRound = true;

List<Sales> sales = new ArrayList<>();
Sales currentRecord;
Reader in;
try {
in = new FileReader(csv);
Iterable<CSVRecord> records;

try {

records = CSVFormat.TDF.withQuote('\"').withFirstRecordAsHeader().parse(in);
for (CSVRecord record : records) {
currentRecord = new Sales();
currentRecord.setAuthor(record.get("Author"));
currentRecord.setTitle(record.get("Title"));
currentRecord.setPublisher(record.get("Imprint Name"));
currentRecord.setIsbn(record.get("Primary ISBN"));
currentRecord.setChannel("Google");
currentRecord.setBookId(record.get("Id"));
currentRecord.setCountry(record.get("Country of Sale"));
currentRecord.setUnits(Integer.parseInt(record.get("Qty")));
currentRecord.setUnitPrice(Float.parseFloat(record.get("List Price [tax exclusive]")));

Date transDate;
try {
transDate = sourceDateFormat.parse(record.get("Transaction Date"));
if (firstRound) {
max.setTime(transDate);
};
current.setTime(transDate);
if (current.after(max)) {
max.setTime(current.getTime());
}
currentRecord.setDatum(transDate);
} catch (ParseException e) {
// TODO Auto-generated catch block
LOG.log(Level.SEVERE,"Nem megfeelő formátumú a dátum a {0} file-ban",csv.getAbsolutePath());
}

currentRecord.setCurrencyCustomer(record.get("List Price Currency"));
currentRecord.setCurrencyProceeds(record.get("Payment Amount"));
currentRecord.setCurrencyProceeds(record.get("Payment Currency"));
sales.add(currentRecord);
}
LOG.log(Level.INFO, "Daily sales transactions of {0} were successfully parsed from ",
csv.getAbsolutePath());
return sales;
} catch (IOException e1) {
// TODO Auto-generated catch block
LOG.log(Level.SEVERE, "Valami nem stimmel a {0} file szerkezetével",csv.getAbsolutePath());
}
} catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
LOG.log(Level.SEVERE,"A {0} file-t nem találom.",csv.getAbsolutePath());
}
return null;
};

当我调试解析过程时,我可以看到 record.get("Author") 引发了运行时异常:

java.lang.IllegalArgumentException: Mapping for Author not found, expected one of [��"

显然我有名为“作者”的专栏。知道出了什么问题吗?

最佳答案

当将其转换为单元测试并使用当前的 commons-csv 版本 1.4 运行时,这对我来说效果很好,因此:

  • 检查最新版本的 commons-csv
  • 确保文件中确实有制表符,而不是由于某种原因在作者条目周围有空白
  • 调用 parse() 时指定文件的实际编码,以正确处理非 ASCII 字符(感谢 @tonakai 的评论)

以下单元测试适用于 commons-csv 1.4

private final static String DATA = "\"Transaction Date\"\t\"Id\"\t\"Product\"\t\"Type\"\t\"Preorder\"\t\"Qty\"\t\"Primary ISBN\"\t\"Imprint Name\"\t\"Title\"\t\"Author\"\t\"Original List Price Currency\"\t\"Original List Price\"\t\"List Price Currency\"\t\"List Price [tax inclusive]\"\t\"List Price [tax exclusive]\"\t\"Country of Sale\"\t\"Publisher Revenue %\"\t\"Publisher Revenue\"\t\"Payment Currency\"\t\"Payment Amount\"\t\"Currency Conversion Rate\"\n" +
"\"2016. 09. 01.\"\t\"ID:1166315449551685\"\t\"Single Purchase\"\t\"Sale\"\t\"None\"\t\"1\"\t\"9789633780664\"\t\"Book and Walk Kft\"\t\"Bánk bán\"\t\"József Katona\"\t\"HUF\"\t\"0,00\"\t\"HUF\"\t\"0,00\"\t\"0,00\"\t\"HU\"\t\"52,0%\"\t\"0,00\"\t\"\"\t\"\"\t\"\"";

@Test
public void parseCsv() throws IOException {
final CSVFormat format = CSVFormat.TDF.withQuote('\"').withFirstRecordAsHeader();
Iterable<CSVRecord> records = format.parse(new StringReader(DATA));

System.out.println("Headers: " + Arrays.toString(format.getHeader()));

for (CSVRecord record : records) {
assertNotNull(record.get("Author"));
assertNotNull(record.get("Title"));
assertNotNull(record.get("Imprint Name"));
assertNotNull(record.get("Primary ISBN"));
assertNotNull(record.get("Id"));
assertNotNull(record.get("Country of Sale"));
assertNotNull(record.get("Qty"));
assertNotNull(record.get("List Price [tax exclusive]"));

assertNotNull(record.get("Transaction Date"));

assertNotNull(record.get("List Price Currency"));
assertNotNull(record.get("Payment Amount"));
assertNotNull(record.get("Payment Currency"));

System.out.println("Record: " + record.toString());
}
}

关于java - Apache CSV 解析器不适用于带引号的制表符分隔数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40128654/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com