gpt4 book ai didi

使用存储过程的 Azure documentdb 批量插入

转载 作者:行者123 更新时间:2023-12-02 17:36:07 25 4
gpt4 key购买 nike

您好,我正在使用 16 个集合来插入大约 3-400 万个 json 对象,每个对象 5-10k。我正在使用存储过程来插入这些文档。我有 22 个容量单元。

function bulkImport(docs) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();

// The count of imported docs, also used as current doc index.
var count = 0;

// Validate input.
if (!docs) throw new Error("The array is undefined or null.");

var docsLength = docs.length;
if (docsLength == 0) {
getContext().getResponse().setBody(0);
}

// Call the CRUD API to create a document.
tryCreateOrUpdate(docs[count], callback);

// Note that there are 2 exit conditions:
// 1) The createDocument request was not accepted.
// In this case the callback will not be called, we just call setBody and we are done.
// 2) The callback was called docs.length times.
// In this case all documents were created and we don't need to call tryCreate anymore. Just call setBody and we are done.
function tryCreateOrUpdate(doc, callback) {
var isAccepted = true;
var isFound = collection.queryDocuments(collectionLink, 'SELECT * FROM root r WHERE r.id = "' + doc.id + '"', function (err, feed, options) {
if (err) throw err;
if (!feed || !feed.length) {
isAccepted = collection.createDocument(collectionLink, doc, callback);
}
else {
// The metadata document.
var existingDoc = feed[0];
isAccepted = collection.replaceDocument(existingDoc._self, doc, callback);
}
});

// If the request was accepted, callback will be called.
// Otherwise report current count back to the client,
// which will call the script again with remaining set of docs.
// This condition will happen when this stored procedure has been running too long
// and is about to get cancelled by the server. This will allow the calling client
// to resume this batch from the point we got to before isAccepted was set to false
if (!isFound && !isAccepted) getContext().getResponse().setBody(count);
}

// This is called when collection.createDocument is done and the document has been persisted.
function callback(err, doc, options) {
if (err) throw err;

// One more document has been inserted, increment the count.
count++;

if (count >= docsLength) {
// If we have created all documents, we are done. Just set the response.
getContext().getResponse().setBody(count);
} else {
// Create next document.
tryCreateOrUpdate(docs[count], callback);
}
}

我的 C# 代码如下所示

    public async Task<int> Add(List<JobDTO> entities)
{

int currentCount = 0;
int documentCount = entities.Count;

while(currentCount < documentCount)
{
string argsJson = JsonConvert.SerializeObject(entities.Skip(currentCount).ToArray());
var args = new dynamic[] { JsonConvert.DeserializeObject<dynamic[]>(argsJson) };

// 6. execute the batch.
StoredProcedureResponse<int> scriptResult = await DocumentDBRepository.Client.ExecuteStoredProcedureAsync<int>(sproc.SelfLink, args);

// 7. Prepare for next batch.
int currentlyInserted = scriptResult.Response;

currentCount += currentlyInserted;

}

return currentCount;
}

我面临的问题是,我尝试插入的 400k 文档有时会丢失,但没有给出任何错误。

该应用程序是部署在云上的辅助角色。如果我增加在 documentDB 中插入的线程或实例的数量,则丢失的文档数量会更高。

如何找出问题所在。提前致谢。

最佳答案

我发现在尝试此代码时,我会在 docs.length 处收到错误,指出长度未定义。

function bulkImport(docs) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();

// The count of imported docs, also used as current doc index.
var count = 0;

// Validate input.
if (!docs) throw new Error("The array is undefined or null.");

var docsLength = docs.length; // length is undefined
}

经过多次测试(在 Azure 文档中找不到任何内容),我意识到我无法按照建议传递数组。参数必须是一个对象。我必须像这样修改批处理代码才能使其运行。

我还发现我也不能简单地尝试在 DocumentDB 脚本资源管理器(输入框)中传递文档数组。即使占位符帮助文本说您可以。

这段代码对我有用:

// psuedo object for reference only
docObject = {
"items": [{doc}, {doc}, {doc}]
}

function bulkImport(docObject) {
var context = getContext();
var collection = context.getCollection();
var collectionLink = collection.getSelfLink();
var count = 0;

// Check input
if (!docObject.items || !docObject.items.length) throw new Error("invalid document input parameter or undefined.");
var docs = docObject.items;
var docsLength = docs.length;
if (docsLength == 0) {
context.getResponse().setBody(0);
}

// Call the funct to create a document.
tryCreateOrUpdate(docs[count], callback);

// Obviously I have truncated this function. The above code should help you understand what has to change.
}

如果我错过了,希望 Azure 文档能够跟上或者变得更容易找到。

我还将发布脚本资源管理器的错误报告,希望 Azurites 能够更新。

关于使用存储过程的 Azure documentdb 批量插入,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28769507/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com