gpt4 book ai didi

java - 为什么 java DirectoryStream 执行这么慢?

转载 作者:塔克拉玛干 更新时间:2023-11-02 08:09:26 26 4
gpt4 key购买 nike

我已经对 Streams 进行了一些测试,特别是对 nio-package 的 DirectoryStreams 进行了测试。我只是尝试获取按上次修改日期和大小排序的目录中所有文件的列表。

旧 File.listFiles() 的 JavaDoc 声明了一个 Note到 Files 中的方法:

Note that the Files class defines the newDirectoryStream method to open a directory and iterate over the names of the files in the directory. This may use less resources when working with very large directories.

我多次运行下面的代码(下面的前三次):

第一次运行:

Run time of Arrays.sort: 1516
Run time of Stream.sorted as Array: 2912
Run time of Stream.sorted as List: 2875

第二次运行:

Run time of Arrays.sort: 1557
Run time of Stream.sorted as Array: 2978
Run time of Stream.sorted as List: 2937

第三轮:

Run time of Arrays.sort: 1563
Run time of Stream.sorted as Array: 2919
Run time of Stream.sorted as List: 2896

我的问题是:为什么流的表现如此糟糕?

import java.io.File;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.FileTime;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class FileSorter {

// This sorts from old to young and from big to small
Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
int sorter = 0;
try {
FileTime lm1 = Files.getLastModifiedTime(o1);
FileTime lm2 = Files.getLastModifiedTime(o2);
if (lm2.compareTo(lm1) == 0) {
Long s1 = Files.size(o1);
Long s2 = Files.size(o2);
sorter = s2.compareTo(s1);
} else {
sorter = lm1.compareTo(lm2);
}
} catch (IOException ex) {
throw new UncheckedIOException(ex);
}
return sorter;
};

public String[] getSortedFileListAsArray(Path dir) throws IOException {
Stream<Path> stream = Files.list(dir);
return stream.sorted(timeSizeComparator).
map(Path::getFileName).map(Path::toString).toArray(String[]::new);
}

public List<String> getSortedFileListAsList(Path dir) throws IOException {
Stream<Path> stream = Files.list(dir);
return stream.sorted(timeSizeComparator).
map(Path::getFileName).map(Path::toString).collect(Collectors.
toList());
}

public String[] sortByDateAndSize(File[] fileList) {
Arrays.sort(fileList, (File o1, File o2) -> {
int r = Long.compare(o1.lastModified(), o2.lastModified());
if (r != 0) {
return r;
}
return Long.compare(o1.length(), o2.length());
});
String[] fileNames = new String[fileList.length];
for (int i = 0; i < fileNames.length; i++) {
fileNames[i] = fileList[i].getName();
}
return fileNames;
}

public static void main(String[] args) throws IOException {
// File (io package)
File f = new File("C:\\Windows\\system32");
// Path (nio package)
Path dir = Paths.get("C:\\Windows\\system32");

FileSorter fs = new FileSorter();

long before = System.currentTimeMillis();
String[] names = fs.sortByDateAndSize(f.listFiles());
long after = System.currentTimeMillis();
System.out.println("Run time of Arrays.sort: " + ((after - before)));

long before2 = System.currentTimeMillis();
String[] names2 = fs.getSortedFileListAsArray(dir);
long after2 = System.currentTimeMillis();
System.out.
println("Run time of Stream.sorted as Array: " + ((after2 - before2)));

long before3 = System.currentTimeMillis();
List<String> names3 = fs.getSortedFileListAsList(dir);
long after3 = System.currentTimeMillis();
System.out.
println("Run time of Stream.sorted as List: " + ((after3 - before3)));
}
}

更新

应用 Peter 的代码后,我得到了以下结果:

Run time of Arrays.sort: 1615
Run time of Stream.sorted as Array: 3116
Run time of Stream.sorted as List: 3059
Run time of Stream.sorted as List with caching: 378

更新2

在对 Peter 的解决方案做了一些研究之后,我可以说,用 for ex 读取文件属性。 Files.getLastModified 必须是一个沉重的紧缩。仅将 Comparator 中的那部分更改为:

  Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
File f1 = o1.toFile();
File f2 = o2.toFile();
long lm1 = f1.lastModified();
long lm2 = f2.lastModified();
int cmp = Long.compare(lm2, lm1);
if (cmp == 0) {
cmp = Long.compare(f2.length(), f1.length());
}
return cmp;
};

在我的电脑上获得更好的结果:

Run time of Arrays.sort: 1968
Run time of Stream.sorted as Array: 1999
Run time of Stream.sorted as List: 1975
Run time of Stream.sorted as List with caching: 488

但是如您所见,缓存对象是最好的方法。正如 jtahlborn 提到的,它是一种稳定的排序。

更新 3(我找到的最佳解决方案)

经过更多的研究,我发现 Files.lastModified 和 Files.size 方法都在同一件事上做了大量工作:属性。所以我做了三个版本的 PathInfo 类来测试:

  1. Peters 版本如下所述
  2. 旧式文件版本,我在构造函数中执行一次 Path.toFile() 并使用 f.lastModified 和 f.length 从该文件中获取所有值
  3. Peters 解决方案的一个版本,但现在我使用 Files.readAttributes(path,BasicFileAttributes.class) 读取了一个 Attribute 对象并对此做了一些事情。

将其全部放入一个循环中,每次执行 100 次,我得出了这些结果:

After doing all hundred times
Mean performance of Peters solution: 432.26
Mean performance of old File solution: 343.11
Mean performance of read attribute object once solution: 255.66

最佳解决方案的 PathInfo 构造函数中的代码:

public PathInfo(Path path) {
try {
// read the whole attributes once
BasicFileAttributes bfa = Files.readAttributes(path, BasicFileAttributes.class);
fileName = path.getFileName().toString();
modified = bfa.lastModifiedTime().toMillis();
size = bfa.size();
} catch (IOException ex) {
throw new UncheckedIOException(ex);
}
}

我的结果:从不读取属性两次,并且在对象中缓存是性能爆发。

最佳答案

Files.list() 是一个 O(N) 操作,而排序是 O(N log N)。更重要的是排序中的操作。鉴于比较不做同样的事情,这是最可能的解释。 C:/Windows/System32 下有很多具有相同修改日期的文件,这意味着将经常检查大小。

为了表明大部分时间不花在 FIles.list(dir) 流中,我优化了比较,因此每个文件只获取一次关于文件的数据。

import java.io.File;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.FileTime;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class FileSorter {

// This sorts from old to young and from big to small
Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
int sorter = 0;
try {
FileTime lm1 = Files.getLastModifiedTime(o1);
FileTime lm2 = Files.getLastModifiedTime(o2);
if (lm2.compareTo(lm1) == 0) {
Long s1 = Files.size(o1);
Long s2 = Files.size(o2);
sorter = s2.compareTo(s1);
} else {
sorter = lm1.compareTo(lm2);
}
} catch (IOException ex) {
throw new UncheckedIOException(ex);
}
return sorter;
};

public String[] getSortedFileListAsArray(Path dir) throws IOException {
Stream<Path> stream = Files.list(dir);
return stream.sorted(timeSizeComparator).
map(Path::getFileName).map(Path::toString).toArray(String[]::new);
}

public List<String> getSortedFileListAsList(Path dir) throws IOException {
Stream<Path> stream = Files.list(dir);
return stream.sorted(timeSizeComparator).
map(Path::getFileName).map(Path::toString).collect(Collectors.
toList());
}

public String[] sortByDateAndSize(File[] fileList) {
Arrays.sort(fileList, (File o1, File o2) -> {
int r = Long.compare(o1.lastModified(), o2.lastModified());
if (r != 0) {
return r;
}
return Long.compare(o1.length(), o2.length());
});
String[] fileNames = new String[fileList.length];
for (int i = 0; i < fileNames.length; i++) {
fileNames[i] = fileList[i].getName();
}
return fileNames;
}

public List<String> getSortedFile(Path dir) throws IOException {
return Files.list(dir).map(PathInfo::new).sorted().map(p -> p.getFileName()).collect(Collectors.toList());
}

static class PathInfo implements Comparable<PathInfo> {
private final String fileName;
private final long modified;
private final long size;

public PathInfo(Path path) {
try {
fileName = path.getFileName().toString();
modified = Files.getLastModifiedTime(path).toMillis();
size = Files.size(path);
} catch (IOException ex) {
throw new UncheckedIOException(ex);
}
}

@Override
public int compareTo(PathInfo o) {
int cmp = Long.compare(modified, o.modified);
if (cmp == 0)
cmp = Long.compare(size, o.size);
return cmp;
}

public String getFileName() {
return fileName;
}
}

public static void main(String[] args) throws IOException {
// File (io package)
File f = new File("C:\\Windows\\system32");
// Path (nio package)
Path dir = Paths.get("C:\\Windows\\system32");

FileSorter fs = new FileSorter();

long before = System.currentTimeMillis();
String[] names = fs.sortByDateAndSize(f.listFiles());
long after = System.currentTimeMillis();
System.out.println("Run time of Arrays.sort: " + ((after - before)));

long before2 = System.currentTimeMillis();
String[] names2 = fs.getSortedFileListAsArray(dir);
long after2 = System.currentTimeMillis();
System.out.println("Run time of Stream.sorted as Array: " + ((after2 - before2)));

long before3 = System.currentTimeMillis();
List<String> names3 = fs.getSortedFileListAsList(dir);
long after3 = System.currentTimeMillis();
System.out.println("Run time of Stream.sorted as List: " + ((after3 - before3)));
long before4 = System.currentTimeMillis();
List<String> names4 = fs.getSortedFile(dir);
long after4 = System.currentTimeMillis();
System.out.println("Run time of Stream.sorted as List with caching: " + ((after4 - before4)));
}
}

这是在我的笔记本电脑上打印的。

Run time of Arrays.sort: 1980
Run time of Stream.sorted as Array: 1295
Run time of Stream.sorted as List: 1228
Run time of Stream.sorted as List with caching: 185

可以看到,大约85%的时间花在了反复获取文件的修改日期和大小上。

关于java - 为什么 java DirectoryStream 执行这么慢?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30920479/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com