gpt4 book ai didi

java - 使用多线程将文本文件拆分为Java中的 block

转载 作者:行者123 更新时间:2023-11-30 10:42:45 28 4
gpt4 key购买 nike

我已经根据公式(文件总大小/拆分大小)拆分了一个文本文件(50GB)。现在拆分是在单线程中按顺序完成的,我如何更改此代码以在多线程中执行拆分(即并行处理线程)应该拆分文件并存储在文件夹中)我不想读取文件,因为它将利用更多的CPU。我的主要目标是我必须减少cpu的利用率并以更少的时间快速完成文件的拆分。我有8个CPU核心。

有什么建议 ??提前致谢。

public class ExecMap {


public static void main(String[] args) throws InterruptedException, ExecutionException, TimeoutException {

String FilePath = "/home/xm/Downloads/wikipedia_50GB/wikipedia_50GB/file21";
File file = new File(FilePath);
long splitFileSize = 64 * 1024 * 1024;
long fileSize = file.length();
System.out.println(+fileSize);
int mappers = (int) (fileSize / splitFileSize);
System.out.println(+mappers);
ExecMap exec= new ExecMap();
exec.mapSplit(FilePath,splitFileSize,mappers,fileSize);
}

private static void mapSplit(String FilePath, long splitlen, int mappers,long fileSize) {
ExecutorService executor = Executors.newFixedThreadPool(1);
executor.submit(() -> {
long len = fileSize;
long leninfile = 0, leng = 0;
int count = 1, data;
try {
long startTime = System.currentTimeMillis(); // Get the start Time
long endTime = 0;
System.out.println(startTime);
File filename = new File(FilePath);
InputStream infile = new BufferedInputStream(new FileInputStream(filename));
data = infile.read();
while (data != -1) {

String name = Thread.currentThread().getName();
System.out.println("task started: " + name +" ====Time " +System.currentTimeMillis());
filename = new File("/home/xm/Desktop/split/" +"Mapper " + count + ".txt");
OutputStream outfile = new BufferedOutputStream(new FileOutputStream(filename));
while (data != -1 && leng < splitlen) {
outfile.write(data);
leng++;
data = infile.read();
}
leninfile += leng;
leng = 0;
outfile.close();
count++;
System.out.println("task finished: " + name);
}
endTime = System.currentTimeMillis();
System.out.println(endTime);
long msec = endTime - startTime;
long sec = endTime - startTime;
System.out.println("Difference in milli seconds: " + msec); //Print the difference in mili seconds
System.out.println("Differencce in Seconds: " + sec / 1000);


} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
executor.shutdownNow();
});


}
}

最佳答案

您可以使用RandomAccessFile并使用seek跳到某个位置。

这样,您可以给执行者一个开始位置和一个结束位置,以便每个执行者都可以处理文件的一小部分

但是正如前面提到的,您的问题将是磁盘I/O。

关于java - 使用多线程将文本文件拆分为Java中的 block ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37990138/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com