gpt4 book ai didi

java - 有多少个文件读取器可以同时读取同一个文件?

转载 作者:行者123 更新时间:2023-11-29 03:20:27 25 4
gpt4 key购买 nike

我有一个 25GB 的巨大 CSV 文件。我知道文件中有大约 5 亿条记录。

我想对数据做一些基本的分析。没什么特别的。

我不想使用 Hadoop/Pig,至少现在还不想。

我编写了一个 java 程序来同时进行分析。这就是我在做什么。

class MainClass {
public static void main(String[] args) {
long start = 1;
long increment = 10000000;
OpenFileAndDoStuff a = new OpenFileAndDoStuff[50];
for(int i=0;i<50;i++) {
a[i] = new OpenFileAndDoStuff("path/to/50GB/file.csv",start,start+increment-1);
a[i].start();
start += increment;
}
for(OpenFileAndDoStuff obj : a) {
obj.join();
}
//do aggregation
}
}

class OpenFileAndDoStuff extends Thread {
volatile HashMap<Integer, Integer> stuff = new HashMap<>();
BufferedReader _br;
long _end;
OpenFileAndDoStuff(String filename, long startline, long endline) throws IOException, FileNotFoundException {
_br = new BufferedReader(new FileReader(filename));
long counter=0;
//move the bufferedReader pointer to the startline specified
while(counter++ < start)
_br.readLine();
this._end = end;
}
void doStuff() {
//read from buffered reader until end of file or until the specified endline is reached and do stuff
}
public void run() {
doStuff();
}
public HashMap<Integer, Integer> getStuff() {
return stuff;
}
}

我想这样做我可以打开 50 个 bufferedReader,所有并行读取 1000 万行卡盘,一旦它们都完成了它们的工作,我就会聚合它们。

但是,我面临的问题是,即使我要求启动 50 个线程,但一次只能启动两个线程并且一次可以从文件中读取。

有没有办法让所有 50 个人同时打开文件并读取文件?为什么我一次仅限于两个读者?

文件在windows 8机器上,java也在同一台机器上。

有什么想法吗?

最佳答案

这是一个类似的帖子:Concurrent reading of a File (java preffered)

The most important question here is what is the bottleneck in your case?

If the bottleneck is your disk IO, then there isn't much you can do at the software part. Parallelizing the computation will only make things worse, because reading the file from different parts simultaneously will degrade disk performance.

If the bottleneck is processing power, and you have multiple CPU cores, then you can take an advantage of starting multiple threads to work on different parts of the file. You can safely create several InputStreams or Readers to read different parts of the file in parallel (as long as you don't go over your operating system's limit for the number of open files). You could separate the work into tasks and run them in parallel

有关使用 FileInputStream 并行读取单个文件的示例,请参阅引用帖子,根据这些基准,这应该比使用 BufferedReader 快得多:http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly#FileReaderandBufferedReader

关于java - 有多少个文件读取器可以同时读取同一个文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23964343/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com