gpt4 book ai didi

node.js - MongoDB 的 filemd5 是否具有设置 readPreference 的能力

转载 作者:可可西里 更新时间:2023-11-01 10:41:04 27 4
gpt4 key购买 nike

我有一个在 Node/Meteor 中构建的文件存储服务,它使用 GridFS,并且跨多个容器进行复制。我目前正在尝试寻找的是,这段代码是否真的知道读/写一致性

db.command({
filemd5: someFileId,
root: 'fs'
}, function callback(err, results) {
...
})

我正在分块上传文件,在将所有分块合并为一个文件后,该命令被执行。我有一种感觉,它正在使用次要成员(我有几个空文件的 md5 值 - d41d8cd98f00b204e9800998ecf8427e)。是否有任何文档或其他设置?

这 2 个参数是文档中描述的唯一选项.. https://docs.mongodb.com/manual/reference/command/filemd5/

更新
合并 block 的确切代码在第 3 方包中:

         cursor = files.find(
{
'metadata._Resumable.resumableIdentifier': file.metadata._Resumable.resumableIdentifier
length:
$ne: 0
},
{
fields:
length: 1
metadata: 1
sort:
'metadata._Resumable.resumableChunkNumber': 1
}
)

https://github.com/vsivsi/meteor-file-collection/blob/master/src/resumable_server.coffee#L26

然后是第 111-119 行,首先执行 filemd5,然后对文件运行更新

                @db.command md5Command, (err, results) ->
if err
lock.releaseLock()
return callback err
# Update the size and md5 to the file data
files.update { _id: fileId }, { $set: { length: file.metadata._Resumable.resumableTotalSize, md5: results.md5 }},
(err, res) =>
lock.releaseLock()
callback err

https://github.com/vsivsi/meteor-file-collection/blob/master/src/resumable_server.coffee#L111-L119

写完最后一个 block 后,cursor = files.find() 启动所有合并内容,因此如果读取首选项是 secondaryPreferred 那么它们可能不会仍然在吗?是否应该将该代码重构为仅使用主要代码?

最佳答案

GridFS 创建了 2 个集合:fileschunks

典型的 files 条目如下所示:

{
"_id" : ObjectId("58cfbc8b6900bb31c7b1b8d9"),
"length" : 4,
"chunkSize" : 261120,
"uploadDate" : ISODate("2017-03-20T11:27:07.812Z"),
"md5" : "d3b07384d113edec49eaa6238ad5ff00",
"filename" : "foo.txt"
}

filemd5 管理命令应该简单地返回相关文件文档的 md5 字段(以及 block 数)。

files.md5
An MD5 hash of the complete file returned by the filemd5 command. This value has the String type.

source: GridFS docs

它应该代表完整文件的哈希值,或者至少是最初保存的文件的哈希值。

What is the ‘md5’ field of a files collection document and how is it used?
‘md5’ holds an MD5 checksum that is computed from the original contents of a user file. Historically, GridFS did not use acknowledged writes, so this checksum was necessary to ensure that writes went through properly. With acknowledged writes, the MD5 checksum is still useful to ensure that files in GridFS have not been corrupted. A third party directly accessing the 'files' and ‘chunks’ collections under GridFS could, inadvertently or maliciously, make changes to documents that would make them unusable by GridFS. Comparing the MD5 in the files collection document to a re-computed MD5 allows detecting such errors and corruption. However, drivers now assume that the stored file is not corrupted, and applications that want to use the MD5 value to check for corruption must do so themselves.

source: GridFS spec

如果以不使用驱动程序的 mongoc_gridfs_file_save 的方式更新(例如,流式传输),则不会更新 md5 字段。

Actually, further reading the spec:

Why store the MD5 checksum instead of creating the hash as-needed? The MD5 checksum must be computed when a file is initially uploaded to GridFS, as this is the only time we are guaranteed to have the entire uncorrupted file. Computing it on-the-fly as a file is read from GridFS would ensure that our reads were successful, but guarantees nothing about the state of the file in the system. A successful check against the stored MD5 checksum guarantees that the stored file matches the original and no corruption has occurred.

这就是我们正在做的。只有 mongoc_gridfs_file_save 会计算文件的 md5 和并存储它。任何其他入口点,例如流式传输,都希望用户已经创建了所有支持的 mongoc_gridfs_file_opt_t 并正确计算了 md5

来源:JIRA issue

关于node.js - MongoDB 的 filemd5 是否具有设置 readPreference 的能力,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42900479/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com