gpt4 book ai didi

java - 如何在 Java 或 Python 中使用文件系统缓存?

转载 作者:太空狗 更新时间:2023-10-30 01:22:03 24 4
gpt4 key购买 nike

A recent blog post on Elasticsearch网站正在谈论他们新的 1.4 测试版的功能。

我很好奇他们是如何使用文件系统缓存的:

Recent releases have added support for doc values. Essentially, doc values provide the same function as in-memory fielddata, but they are written to disk at index time. The benefit that they provide is that they consume very little heap space. Doc values are read from disk, instead of from memory. While disk access is slow, doc values benefit from the kernel’s filesystem cache. The filesystem cache, unlike the JVM heap, is not constrained by the 32GB limit. By shifting fielddata from the heap to the filesystem cache, you can use smaller heaps which means faster garbage collections and thus more stable nodes.

Before this release, doc values were significantly slower than in-memory fielddata. The changes in this release have improved the performance significantly, making them almost as fast as in-memory fielddata.

这是否意味着我们可以操纵文件系统缓存的行为,而不是被动地等待操作系统的影响?如果是这样,我们在正常的应用程序开发中如何使用文件系统缓存呢?比如,如果我正在编写 Python 或 Java 程序,我该怎么做?

最佳答案

文件系统缓存是与操作系统内部工作相关的实现细节,对最终用户是透明的。这不是需要调整或改变的东西。 Lucene 在管理索引段时已经使用了文件系统缓存。每次将某些内容(通过 Elasticsearch)索引到 Lucene 中时,这些文档都会写入段,这些段首先写入文件系统缓存,然后在一段时间后(当 translog - 一种跟踪被索引文档的方式 - 是例如完整)缓存的内容被写入实际文件。但是,虽然要索引的文档位于文件系统缓存中,但它们仍然可以被访问。

doc values 实现的这种改进指的是此功能现在能够使用文件系统缓存,因为它们是从磁盘读取的,放入缓存并从那里访问,而不是占用堆空间。

如何访问此文件系统缓存在 this excellent blog post 中有描述。 :

In our previous approaches, we were relying on using a syscall to copy the data between the file system cache and our local Java heap. How about directly accessing the file system cache? This is what mmap does!

Basically mmap does the same like handling the Lucene index as a swap file. The mmap() syscall tells the O/S kernel to virtually map our whole index files into the previously described virtual address space, and make them look like RAM available to our Lucene process. We can then access our index file on disk just like it would be a large byte[] array (in Java this is encapsulated by a ByteBuffer interface to make it safe for use by Java code). If we access this virtual address space from the Lucene code we don’t need to do any syscalls, the processor’s MMU and TLB handles all the mapping for us. If the data is only on disk, the MMU will cause an interrupt and the O/S kernel will load the data into file system cache. If it is already in cache, MMU/TLB map it directly to the physical memory in file system cache.

涉及到Java程序中mmap的实际使用方式,我觉得this is the class and method to do so .

关于java - 如何在 Java 或 Python 中使用文件系统缓存?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26622471/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com