gpt4 book ai didi

Solr 建议器 : distributed search (solrcloud) duplicate results

转载 作者:行者123 更新时间:2023-12-01 15:39:38 24 4
gpt4 key购买 nike

我有两个分片,我正在尝试使用对分片的分布式搜索来实现建议器(使用 solr 4.10.1)。似乎建议者遍历每个分片并加入结果集,留下重复项。在我的 solrconfig.xml 文件中,我有以下内容:

<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">titleSuggester</str>
<str name="lookupimpl">AnalyzingLookupFactory</str>
<str name="lookupimpl">FreeTextSuggesterFactory</str>
<str name="dictionaryimpl">DocumentDictionaryFactory</str>
<str name="field">title_sug</str>
<str name="weightField">rank</str>
<str name="suggestAnalyzerFieldType">shingleSuggest</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>`


<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>

http://localhost:8983/solr/collection1/suggest?suggest.dictionary=titleSuggester&shards.qt=/suggest&shards=shard1,shard2&suggest.q=an&wt=json&indent=true 结果:

{   "responseHeader":{
"status":0,
"QTime":12}, "suggest":{"titleSuggester":{
"an":{
"numFound":10,
"suggestions":[{
"term":"an",
"weight":149,
"payload":""},
{
"term":"an",
"weight":142,
"payload":""},
{
"term":"an american",
"weight":6,
"payload":""},
{
"term":"an affair",
"weight":4,
"payload":""},
{
"term":"an 18th century",
"weight":2,
"payload":""},
{
"term":"an 18th",
"weight":2,
"payload":""},
{
"term":"an american hymn",
"weight":2,
"payload":""},
{
"term":"an 18th century drawing room",
"weight":2,
"payload":""},
{
"term":"an 18th century drawing",
"weight":2,
"payload":""},
{
"term":"an american hymn (main",
"weight":2,
"payload":""}]}}}}

如上所示,结果项“an”返回了两次,每个分片返回一次。如果我使用 distrib=false (http://localhost:8983/solr/collection1/suggest?suggest.dictionary=titleSuggester&distrib=false&suggest.q=an&wt=json&indent=true), 正如预期的那样,我只得到没有重复项:

{ "responseHeader":{
"status":0,
"QTime":1},
"suggest":{"titleSuggester":{
"an":{
"numFound":10,
"suggestions":[{
"term":"an",
"weight":149,
"payload":""},
{
"term":"an 18th",
"weight":2,
"payload":""},
{
"term":"an 18th century",
"weight":2,
"payload":""},
{
"term":"an 18th century drawing",
"weight":2,
"payload":""},
{
"term":"an 18th century drawing room",
"weight":2,
"payload":""},
{
"term":"an absolution take",
"weight":1,
"payload":""},
{
"term":"an absolution take her",
"weight":1,
"payload":""},
{
"term":"an absolution take her to",
"weight":1,
"payload":""},
{
"term":"an absolution take her to sea,",
"weight":1,
"payload":""},
{
"term":"an affair",
"weight":4,
"payload":""}]}}}}

有没有办法去掉重复的结果?

最佳答案

您可以使用 Solr 的组功能;添加到您的查询:

&group=true&group.field=term&group.main=true

这将只返回每个相同术语的一个文档,并将以与常规查询相同的格式返回它们 (group.main=true)。

参见 http://wiki.apache.org/solr/FieldCollapsing了解更多信息。

关于Solr 建议器 : distributed search (solrcloud) duplicate results,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27786271/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com