gpt4 book ai didi

java - 在 Java 中点击 CSV 文件的行首或行尾

转载 作者:行者123 更新时间:2023-12-02 01:08:06 24 4
gpt4 key购买 nike

我正在使用此代码来分割和处理 csv 文件,问题是 block 被设置在任意位置,可能在行的开头、中间或结尾!

如何将 start_loc 设置为行首或行尾,以便 block 成为完整的 CSV 文件而不会丢失任何数据?

public static void main(String[] args) throws IOException {
long start = System.currentTimeMillis();

CSVReader reader = new CSVReader(new FileReader("x_tran.csv"));
String[] columnsNames = reader.readNext();
reader.close();
FileInputStream fileInputStream = new FileInputStream("x_tran.csv");
FileChannel channel = fileInputStream.getChannel();
long remaining_size = channel.size(); //get the total number of bytes in the file
long chunk_size = remaining_size / 4; //file_size/threads

//Max allocation size allowed is ~2GB
if (chunk_size > (Integer.MAX_VALUE - 5))
{
chunk_size = (Integer.MAX_VALUE - 5);
}

//thread pool
ExecutorService executor = Executors.newFixedThreadPool(4);

long start_loc = 0;//file pointer
int i = 0; //loop counter
boolean first = true;
while (remaining_size >= chunk_size)
{
//launches a new thread
executor.execute(new FileRead(start_loc, toIntExact(chunk_size), channel, i, String.join(",", columnsNames), first));
remaining_size = remaining_size - chunk_size;
start_loc = start_loc + chunk_size;
i++;
first = false;
}

//load the last remaining piece
executor.execute(new FileRead(start_loc, toIntExact(remaining_size), channel, i, String.join(",", columnsNames), first));

//Tear Down
executor.shutdown();

//Wait for all threads to finish
while (!executor.isTerminated())
{
//wait for infinity time
}
System.out.println("Finished all threads");
fileInputStream.close();


long finish = System.currentTimeMillis();
System.out.println( "Time elapsed: " + (finish - start) );
}

最佳答案

您可以读取文件一次,然后让每个线程处理以线程数为模的行(例如第一个线程处理第 0、4、8 行等)。

package ...;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class CsvParallelReader {

private static final int THREAD_NUMBER = 4;

public static void main(String[] args) {


ExecutorService executor = Executors.newFixedThreadPool(THREAD_NUMBER);


try {
List<String> lines = Files.readAllLines(Path.of("yourfile.csv"));

for (int i = 0; i < THREAD_NUMBER; i++) {
Runnable readTask = new ReadTask(i, lines);
executor.submit(readTask);
}
} catch (IOException e) {
e.printStackTrace();
}


}

private static class ReadTask implements Runnable {

private final List<String> lines;
private int start;

public ReadTask(int start, List<String> lines) {
this.start = start;
this.lines = lines;
}

@Override
public void run() {
for (int i = start; i < lines.size(); i += THREAD_NUMBER) {
// do something with this line of data
}
}
}
}

关于java - 在 Java 中点击 CSV 文件的行首或行尾,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57712132/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com