gpt4 book ai didi

RavenDb 对查询数百万文档的性能的期望

转载 作者:行者123 更新时间:2023-12-05 00:35:06 25 4
gpt4 key购买 nike

我能够使用嵌入式版本的 RavenDb 加载几百万个文档,非常漂亮!。

现在我试图查询这些项目,我发现性能不是我所期望的,如果可能的话几乎是瞬时的,而是在一台相当强大的机器上超过 18 秒。

下面,你会发现我的天真代码。

注意:我现在已经解决了这个问题,最终代码在帖子底部。需要注意的是,您需要索引,它们必须是正确的类型,并且需要让 RavenDB 意识到它们。对通过查询引擎返回的记录的性能和质量非常满意。

谢谢,
斯蒂芬

using (var store = new EmbeddableDocumentStore { DataDirectory = @"C:\temp\ravendata" }.Initialize())
{
using (IDocumentSession session = store.OpenSession())
{
var q = session.Query<Product>().Where(x => x.INFO2.StartsWith("SYS")).ToList();
}
}


[Serializable]
public class Product
{
public decimal ProductId { get; set; }
....
public string INFO2 { get; set; }
}

编辑

我添加了这个类
public class InfoIndex_Search : AbstractIndexCreationTask<Product>
{
public InfoIndex_Search()
{
Map = products =>
from p in products
select new { Info2Index = p.INFO2 };

Index(x => x.INFO2, FieldIndexing.Analyzed);
}
}

并将调用方法更改为如下所示。
        using (var store = new EmbeddableDocumentStore { DataDirectory = @"C:\temp\ravendata" }.Initialize())
{
// Tell Raven to create our indexes.
IndexCreation.CreateIndexes(Assembly.GetExecutingAssembly(), store);

List<Product> q = null;
using (IDocumentSession session = store.OpenSession())
{
q = session.Query<Product>().Where(x => x.INFO2.StartsWith("SYS")).ToList();
watch.Stop();
}
}

但我仍然报告 18 秒进行搜索。我错过了什么?另一方面,C:\temp\ravendata\Indexes\InfoIndex%2fSearch 文件夹中有不少新文件,虽然没有我插入数据时多,但运行此代码后它们似乎几乎消失了几次尝试查询。应该 IndexCreation.CreateIndexes(Assembly.GetExecutingAssembly(), store);在插入之前调用,然后才调用?

编辑1

使用此代码,我几乎可以在一个实例中进行查询,但似乎您只能运行一次,所以这就引出了一个问题。它在哪里运行,正确的初始化程序是什么?
store.DatabaseCommands.PutIndex("ProdcustByInfo2", new IndexDefinitionBuilder<Product>
{
Map = products => from product in products
select new { product.INFO2 },
Indexes =
{
{ x => x.INFO2, FieldIndexing.Analyzed}
}
});

EDIT2:工作示例
static void Main()
{
Stopwatch watch = Stopwatch.StartNew();

int q = 0;
using (var store = new EmbeddableDocumentStore { DataDirectory = @"C:\temp\ravendata" }.Initialize())
{
if (store.DatabaseCommands.GetIndex("ProdcustByInfo2") == null)
{
store.DatabaseCommands.PutIndex("ProdcustByInfo2", new IndexDefinitionBuilder<Product>
{
Map = products => from product in products
select new { product.INFO2 },
Indexes = { { x => x.INFO2, FieldIndexing.Analyzed } }
});
}
watch.Stop();
Console.WriteLine("Time elapsed to create index {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);

watch = Stopwatch.StartNew();
using (IDocumentSession session = store.OpenSession())
{
q = session.Query<Product>().Count();
}
watch.Stop();
Console.WriteLine("Time elapsed to query for products values {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
Console.WriteLine("Total number of products loaded: {0}{1}", q, System.Environment.NewLine);

if (q == 0)
{
watch = Stopwatch.StartNew();
var productsList = Parsers.GetProducts().ToList();
watch.Stop();
Console.WriteLine("Time elapsed to parse: {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
Console.WriteLine("Total number of items parsed: {0}{1}", productsList.Count, System.Environment.NewLine);

watch = Stopwatch.StartNew();
productsList.RemoveAll(_ => _ == null);
watch.Stop();
Console.WriteLine("Time elapsed to remove null values {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
Console.WriteLine("Total number of items loaded: {0}{1}", productsList.Count, System.Environment.NewLine);

watch = Stopwatch.StartNew();
int batch = 0;
var session = store.OpenSession();
foreach (var product in productsList)
{
batch++;
session.Store(product);
if (batch % 128 == 0)
{
session.SaveChanges();
session.Dispose();
session = store.OpenSession();
}
}
session.SaveChanges();
session.Dispose();
watch.Stop();
Console.WriteLine("Time elapsed to populate db from collection {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
}

watch = Stopwatch.StartNew();
using (IDocumentSession session = store.OpenSession())
{
q = session.Query<Product>().Where(x => x.INFO2.StartsWith("SYS")).Count();
}
watch.Stop();
Console.WriteLine("Time elapsed to query for term {0}{1}", watch.ElapsedMilliseconds, System.Environment.NewLine);
Console.WriteLine("Total number of items found: {0}{1}", q, System.Environment.NewLine);
}
Console.ReadLine();
}

最佳答案

首先,您是否有涵盖 INFO2 的索引?

其次,请参阅 Daniel Lang 的“在 RavenDB 中搜索字符串属性”博客文章:

http://daniellang.net/searching-on-string-properties-in-ravendb/

如果有帮助,以下是我创建索引的方法:

public class LogMessageCreatedTime : AbstractIndexCreationTask<LogMessage>
{
public LogMessageCreatedTime()
{
Map = messages => from message in messages
select new { MessageCreatedTime = message.MessageCreatedTime };
}
}

以及我如何在运行时添加它:
private static DocumentStore GetDatabase()
{
DocumentStore documentStore = new DocumentStore();

try
{
documentStore.ConnectionStringName = "RavenDb";
documentStore.Initialize();

// Tell Raven to create our indexes.
IndexCreation.CreateIndexes(typeof(DataAccessFactory).Assembly, documentStore);
}
catch
{
documentStore.Dispose();
throw;
}

return documentStore;
}

就我而言,我不必显式查询索引;只是正常查询的时候用的。

关于RavenDb 对查询数百万文档的性能的期望,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9841775/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com