gpt4 book ai didi

Python:Whoosh 似乎返回不正确的结果

转载 作者:太空宇宙 更新时间:2023-11-04 01:15:12 24 4
gpt4 key购买 nike

此代码直接来自 Whoosh 的 quickstart docs :

import os.path
from whoosh.index import create_in
from whoosh.fields import Schema, STORED, ID, KEYWORD, TEXT
from whoosh.index import open_dir
from whoosh.query import *
from whoosh.qparser import QueryParser

#establish schema to be used in the index
schema = Schema(title=TEXT(stored=True), content=TEXT,
path=ID(stored=True), tags=KEYWORD, icon=STORED)

#create index directory
if not os.path.exists("index"):
os.mkdir("index")

#create the index using the schema specified above
ix = create_in("index", schema)

#instantiate the writer object
writer = ix.writer()

#add the docs to the index
writer.add_document(title=u"My document", content=u"This is my document!",
path=u"/a", tags=u"first short", icon=u"/icons/star.png")
writer.add_document(title=u"Second try", content=u"This is the second example.",
path=u"/b", tags=u"second short", icon=u"/icons/sheep.png")
writer.add_document(title=u"Third time's the charm", content=u"Examples are many.",
path=u"/c", tags=u"short", icon=u"/icons/book.png")

#commit those changes
writer.commit()

#identify searcher
with ix.searcher() as searcher:

#specify parser
parser = QueryParser("content", ix.schema)

#specify query -- try also "second"
myquery = parser.parse("is")

#search for results
results = searcher.search(myquery)

#identify the number of matching documents
print len(results)

我只是向 parser.parse() 调用传递了一个值——即动词“is”。然而,当我运行它时,我得到的结果是长度为零,而不是预期的长度为二的结果。如果我用“second”替换“is”,我会得到一个结果,正如预期的那样。但是,为什么使用“is”的搜索没有产生匹配项?

编辑

正如@Philippe 指出的那样,默认的 Whoosh 索引器会删除停用词,因此会出现上述行为。如果你想保留停用词,你可以指定在索引一个索引中的给定字段时你希望使用哪个分析器,你可以向你的分析器传递一个参数来避免去除停用词;例如:

schema = Schema(title=TEXT(stored=True, analyzer=analysis.StandardAnalyzer(stoplist=None)))

最佳答案

关于Python:Whoosh 似乎返回不正确的结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25087290/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com