gpt4 book ai didi

python - Whoosh 有多快?

转载 作者:太空狗 更新时间:2023-10-29 22:30:08 35 4
gpt4 key购买 nike

Whoosh 是一个用纯 Python ( official website) 实现的快速、功能强大的全文索引和搜索库。

但我找不到与其他搜索引擎相比的任何速度/性能比较,尤其是基于 Lucene 的搜索引擎(pyLucene、Lupyne...)?

我习惯使用 pyLucene,它以速度快着称,但非常非 pythonic 且不易处理(直接 java-Lucene 包装器)。有一个 pyLucene 的 pythonic 包装器;羽扇 bean 。但是,当需要 Lucene 的核心功能时,这并不方便。

Whoosh 和其他之间的任何性能提示将不胜感激。

最佳答案

{1}Whoosh 与 Xappy/Xapian

有用于测试 Whoosh 和 Xappy/Xapian 支持的 Python 搜索的基准 here .

Whoosh 作者使用这些基准测试针对 Xappy/Xapian 的 whoosh (ref) :

基准如何运作

N documents are generated, the search word is a random word and 10 chars long, plus 10 extra fields with 100 chars of random stuff each (just to pump up the size of the document).

For indexing, all fields are indexed and stored.

For searching, all words are searched in random order and all stored fields are retrieved.

For whoosh, we used the multiprocessing writer for building the index - this explains why it is faster for indexing than xappy (because it used all 4 cores, not just 1).

For searching, xappy/xapian is faster (there was no parallel processing used).But you see that the speed difference between xappy and whoosh is maybe not as big as you expected.

索引大小约 12MB

# Phenom II X4 840, 8GB RAM, HDD
# Python 2.7.2+ (default, Oct 4 2011, 20:06:09)
# [GCC 4.6.1] on linux2

Params:
DOC_COUNT: 3000 WORD_LEN: 10
EXTRA_FIELD_COUNT: 10 EXTRA_FIELD_LEN: 100

Benchmarking: xappy 0.5 / xapian 1.2.5
Indexing takes 2.8s (1068.9/s)
Searching takes 0.5s (6635.8/s)

Benchmarking: whoosh 2.3.2
Indexing takes 0.8s (3575.6/s)
Searching takes 0.8s (3714.8/s)

关于python - Whoosh 有多快?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29102906/

35 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com