Update: This is now more for documentation after doing more tests.
更新:在做了更多测试后,现在更多的是文档。
TL;DR using POSIX_FADV_DONTNEED isn't worth it. Not using it gives best speed. On AMD64 it even seems not to be respected.
TL;DR使用POSIX_FADV_DONTNEED是不值得的。不使用它可以获得最快的速度。在AMD64上,它甚至似乎没有得到尊重。
Environment Rasperry PI4 USB 3.0 5TB spinning disk with ext4 file system
支持ext4文件系统的环境Rasperry pi4 USB 3.0 5TB旋转磁盘
Using 1 thread with this config gives the best speed, probably because of the spinning disk.
在此配置下使用1个线程可获得最佳速度,这可能是因为旋转磁盘的缘故。
When calculating the SHA256 sum of all files in a directory tree (checking multiple restic repositories without having to enter the encryption password for every repository), the read spead of the disk is displayed in nmon and with node_exporter almost twice the speed when using POSIX_FADV_DONTNEED
. This argument tells the kernel not to keep the data in the cache. This makes sense because these files are read only once and would otherwise pollute the cache of the system and thus slow it down because other data would miss in the cache.
在计算目录树中所有文件的SHA256总和时(检查多个Restic存储库,而不必为每个存储库输入加密密码),磁盘的读取速度以nmon显示,并且NODE_EXPORTER的速度几乎是使用POSIX_FADV_DONTNEED时的两倍。该参数告诉内核不要将数据保存在缓存中。这是有意义的,因为这些文件只被读取一次,否则会污染系统的缓存,从而减慢系统的速度,因为其他数据将在缓存中丢失。
Without POSIX_FADV_DONTNEED
read speed is between 60 and 90 MB/s. With POSIX_FADV_DONTNEED
read speed is between 155 MB/s and 175 MB/s, so about twice the speed. This value is shown in nmon and with prometheus node_exporter in combination with VictoriaMetrics. However using the time command gives completly different results. Between each run there was a ' sync; echo 3 > /proc/sys/vm/drop_caches
在没有POSIX_FADV_DONTNEED的情况下,读取速度在60到90 MB/S之间。使用POSIX_FADV_DONTNEED时,读取速度在155 MB/S到175MB/S之间,大约是这个速度的两倍。该值以nmon和Prometheus node_exporter与VictoriaMetrics相结合的形式显示。但是,使用time命令会产生完全不同的结果。在每次运行之间有一个‘sync;ECHO 3>/proc/sys/vm/Drop_caches
With posix_fadvise(fd, 0, bytesRead)
time was 37s and slow disk speed was displayed. When using posix_fadvise(fd, 0, 0)
about twice the disk speed was displayed, but in fact time was 1m8 seconds.
When using
使用POSIX_FADVISE(fd,0,bytesRead)时,时间为37s,显示磁盘速度较慢。当使用POSIX_FADVISE(fd,0,0)时,显示的磁盘速度大约是磁盘速度的两倍,但实际上时间是1m8秒。使用时
def posix_fadvise(fd, offset, length):
return
only 29s where needed, so the fastest results were reached not using POSIX_FADV_DONTNEED
at all.
只需要29秒,所以在根本不使用POSIX_FADV_DONTNEED的情况下达到了最快的结果。
So there is a wrong disk speed shown on Raspberry-Pi, where as more accurate speed is shown on AMD64. On Raspberry PI you can see in VictoriaMectrics and the cache size that it isn't growing when using POSIX_FADV_DONTNEED
, so the flag is respected.
因此,Raspberry-PI上显示的磁盘速度是错误的,而AMD64上显示的速度更准确。在Raspberry PI上,您可以在VictoriaMectrics中看到,当使用POSIX_FADV_DONTNEED时,它的缓存大小不会增长,因此该标志是受尊重的。
EDIT: On a hosted VM with SSD and much more performance even when using 4 threads, using POSIX_FADV_DONTNEED
makes it reproducible about factor 5 slower. Between every run I did # echo 3 > /proc/sys/vm/drop_caches
Very strange.
编辑:在使用SSD且即使使用4个线程也能获得更高性能的托管VM上,使用POSIX_FADV_DONTNEED使其可重现性降低约5倍。在每次运行之间,我执行了#ECHO 3>/proc/sys/vm/Drop_cach,这非常奇怪。
EDIT2: On a physical host using a spinning disk connected via USB3.0 and 1 thread when using POSIX_FADV_DONTNEED it takes 1m25s to read all the files. After clearing cache with # echo 3 > /proc/sys/vm/drop_caches
and not using POSIX_FADV_DONTNEED it only takes 12seconds to calculate the checksum. So a factor 7 (!) difference.
EDIT2:在使用通过USB3.0连接的旋转磁盘和一个线程的物理主机上,当使用POSIX_FADV_DONTNEED时,读取所有文件需要1m25秒。在使用#ECHO 3>/proc/sys/vm/DROP_CACHES清除缓存并且不使用POSIX_FADV_DONTNEED之后,计算校验和只需要12秒。因此,因子7(!)不同之处。
Update 20.09.2023: With VictoriaMectrics I can see that POSIX_FADV_DONTNEED
seems not to be respected regarding cache on AMD64 (on RPi it is, see above), you can see it growing, despite setting the flag.
There is no noticeable difference in speed (using time
command) between using posix_fadvise(fd, 0, bytesRead)
(real 1m53,630s user 0m48,819s sys 0m5,627s) and immediately returning in def posix_fadvise(fd, offset, length):
(real 1m52,675s user 0m51,346s sys 0m6,928s). Using posix_fadvise(fd, 0, 0)
takes real 2m31,398s user 1m2,004s sys 0m16,178s
更新20.09.2023:使用VictoriaMectrics,我可以看到POSIX_FADV_DONTNEED在AMD64上的缓存似乎不受尊重(在RPI上,请参见上文),您可以看到它在增长,尽管设置了标志。在使用POSIX_fise(fd,0,bytesRead)(实际1m53,630s用户0m48,819s sys 0m5,627s)和立即返回def POSIX_fise(fd,Offset,Long):(实际1m52,675s用户0m51,346s sys 0m6,928s)之间,速度(使用时间命令)没有明显差异。使用POSIX_FADVISE(fd,0,0)获取实数2m31,398s用户1m2,004s系统0m16,178s
import os
import subprocess
import hashlib
import concurrent.futures
import sys
import ctypes
# Constants for posix_fadvise
POSIX_FADV_DONTNEED = 4
base_directory = '/home/pi/5TB'
num_threads = 1 # Adjust the number of threads as needed
# Define posix_fadvise function
def posix_fadvise(fd, offset, length):
#return #uncomment and speed will be much slower
libc = ctypes.CDLL("libc.so.6")
ret = libc.posix_fadvise(fd, offset, length, POSIX_FADV_DONTNEED)
if ret != 0:
raise OSError(f"posix_fadvise failed with error code {ret}")
def calculate_sha256(file_path):
try:
# Calculate the SHA256 checksum of the file
sha256_hash = hashlib.sha256()
bytesRead = 0 # Initialize the counter for bytes read
with open(file_path, 'rb') as f:
fd = f.fileno() # Get file descriptor
# Advise the kernel that we don't need the file data anymore
#posix_fadvise(fd, 0, 0)
while True:
data = f.read(65536) # Read in 64KB chunks
if not data:
break
bytesRead += len(data)
sha256_hash.update(data)
posix_fadvise(fd, 0, bytesRead)
checksum = sha256_hash.hexdigest()
# Check if the checksum matches the filename
filename = os.path.basename(file_path)
if checksum != filename:
sys.stderr.write(f"Error: Checksum mismatch for file '{file_path}'\n")
return file_path
except Exception as e:
sys.stderr.write(f"Error processing file '{file_path}': {str(e)}\n")
return None
def process_files_in_directory(directory):
files = [os.path.join(directory, filename) for filename in os.listdir(directory) if os.path.isfile(os.path.join(directory, filename))]
results = []
with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:
for file in executor.map(calculate_sha256, files):
if file is not None:
results.append(file)
return results
if __name__ == "__main__":
checked_count = 0
for root, _, _ in os.walk(base_directory):
checked_files = process_files_in_directory(root)
checked_count += len(checked_files)
if checked_count % 100 == 0:
sys.stdout.write(f"Checked {checked_count} files...\n")
sys.stdout.flush() # Flush the stdout buffer to write immediately
sys.stdout.write(f"Checked {checked_count} files in total.\n")
更多回答
I think you are doing it backwards. Instead of preventing the data from going into page cash, you can simply mmap the file and work within the page cache directly then MADV_FREE on it when you are done. This way you are still using a single copy of the data, but the page cache's copy rather than your local copy.
我认为你是在倒退。不是阻止数据进入页面现金,而是只需映射文件并直接在页面缓存中工作,然后在完成后在其上执行MADV_FREE。这样,您仍然使用数据的单个副本,但页面缓存的副本而不是您的本地副本。
By calling posix_fadvise
with offset == 0
and length == 0
you're telling the kernel that you don't need any byte of the entire file, yet you immediately proceed to read from it again. If the kernel has been reading ahead because the bytes are spinning past the read head anyway, that could explain the performance hit. Probably you want to set length
to the number of bytes you've already read.
通过调用带有偏移量==0和长度==0的POSIX_FADEST,您告诉内核您不需要整个文件的任何字节,但是您立即开始再次读取它。如果内核一直在提前读取,因为字节无论如何都会旋转过读取头,这可以解释性能受到的影响。您可能希望将长度设置为您已经读取的字节数。
Thanks for the hint. I used this to do some additional measurements. I've updated the whole post. When not using POSIX_FADV_DONTNEED best speeds are reached on AMD64 as well RasbperryPi.
谢谢你的提示。我用这个做了一些额外的测量。我已经更新了整个帖子。不使用POSIX_FADV_DONTNEED时,AMD64以及RasbperryPI也可达到最佳速度。
我是一名优秀的程序员,十分优秀!