gpt4 book ai didi

java - 使用 Java 将逗号分隔的 CSV 文件转换为制表符分隔

转载 作者:行者123 更新时间:2023-12-02 04:09:28 26 4
gpt4 key购买 nike

我正在尝试使用 Java 将逗号分隔的 CSV 文件转换为制表符分隔的 CSV 文件。但是,文件本身内的一些值带有逗号。请引用以下示例:

Direct - House,Bayer House Advertiser,537121661,,160 x 600,Bayer US Publisher,537121625,Bayer.com,537224178,160x600_MyeBay_US,538146889,2015-11-18,"8,455,844",0,0,0,0.000000,USD,0.000000,0.000000,0.000000

Direct - House,Bayer House Advertiser,537121661,,160 x 600,Bayer US Publisher,537121625,Bayer.com,537224178,160x600_Search_SLR,538146895,2015-11-18,"20,175,240",30,0,0,0.000000,USD,0.000000,0.000000,0.000000

那么有人可以帮助我如何处理这些值吗?

最佳答案

我认为最好的选择是依赖不变的模式。您确实提到您对以逗号作为千位分隔符的数字有疑问。我看到在你的行中这些数字用双引号括起来。基于以下假设:

  1. 数字用双引号括起来
  2. 每一行中只有一个这样的数字(如果超过一个,则找到所有双引号对并将它们存储在数组或列表中,并检查以确保索引不落在每个双引号的范围内)

然后您可以执行以下操作:

  1. 获取双引号的第一个索引,即 154
  2. 获取双引号的第二个/最后一个索引,即 159
  3. 用\t 替换所有逗号,前提是逗号索引小于第一个双引号的第一个索引或逗号索引大于双引号的最后一个索引(这应该跳过要替换为\t 的数字的逗号)

以下代码完全为您执行上述操作:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.List;

public class CsvToTabConvertor {
public static void main(String[] args) {
File file = new File("C:\\test_java\\csvtotab.txt");
List<String> processedLines = new ArrayList<String>();

try {
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
StringBuilder builder;
while((line=br.readLine()) != null) {
builder = new StringBuilder(line);

//find number in double quote - assuming there is only one number with double quotes
int doubleQuoteIndexStart = builder.indexOf("\"");
int doubleQuoteIndexLast = builder.lastIndexOf("\"");

//for each line, find all indexes of comma
int index = builder.indexOf(",");

//previous used to when there is consecutive comma
int prevIndex = 0;

while (index >= 0) {
if(index < doubleQuoteIndexStart || index > doubleQuoteIndexLast) {
builder.setCharAt(index, '\t');
}

//get next index of comma
index = builder.indexOf(",", index + 1);

//check for consecutive commas
if(index != -1 && (prevIndex +1) == index) {
builder.setCharAt(index, ' ');
//get next index of comma
index = builder.indexOf(",", index + 1);
}
}

//add the line to list of lines for later storage to file
processedLines.add(builder.toString());
}

//close the output stream
br.close();

//write all the lines to the file
File outFile = new File("C:\\test_java\\csvtotab_processed.txt");
PrintWriter writer = new PrintWriter(outFile);
for(int i = 0; i < processedLines.size(); i++) {
writer.println(processedLines.get(i));
}

writer.close();
} catch(Exception ex) {
//handle exception
}
}
}

包含以下行的输入文件:

Direct - House,eBay House Advertiser,537121661,,160 x 600,eBay US Publisher,537121625,eBay.com,537224178,160x600_MyeBay_US,538146889,2015-11-18,"8,455,844",0,0,0,0.000000,USD,0.000000,0.000000,0.000000
Direct - House,eBay House Advertiser,537121661,,160 x 600,eBay US Publisher,537121625,eBay.com,537224178,160x600_Search_SLR,538146895,2015-11-18,"20,175,240",30,0,0,0.000000,USD,0.000000,0.000000,0.000000

处理后的输出文件如下:

Direct - House  eBay House Advertiser   537121661       160 x 600   eBay US Publisher   537121625   eBay.com    537224178   160x600_MyeBay_US   538146889   2015-11-18  "8,455,844" 0   0   0   0.000000    USD 0.000000    0.000000    0.000000
Direct - House eBay House Advertiser 537121661 160 x 600 eBay US Publisher 537121625 eBay.com 537224178 160x600_Search_SLR 538146895 2015-11-18 "20,175,240" 30 0 0 0.000000 USD 0.000000 0.000000 0.000000

修改上述代码及其逻辑以满足任何进一步的需求。

关于java - 使用 Java 将逗号分隔的 CSV 文件转换为制表符分隔,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33929134/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com