作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我试图了解docFreq是如何计算的。是每个索引,每个字段的每个映射吗?
将explain设置为true时,我从查询中得到了这些结果。
当命中在映射中时,ListedName.standard docFreq较低,如下所示
{
"value" : 16.316673,
"description" : """weight(ListedName.standard:"eagle pointe" in 48) [PerFieldSimilarity], result of:""",
"details" : [
{
"value" : 16.316673,
"description" : "score(doc=48,freq=1.0 = phraseFreq=1.0\n), product of:",
"details" : [
{
"value" : 3.0,
"description" : "boost",
"details" : [ ]
},
{
"value" : 5.4388914,
"description" : "idf(), sum of:",
"details" : [
{
"value" : 1.7870536,
"description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details" : [
{
"value" : 35.0,
"description" : "docFreq",
"details" : [ ]
},
{
"value" : 211.0,
"description" : "docCount",
"details" : [ ]
}
]
},
{
"value" : 3.651838,
"description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details" : [
{
"value" : 5.0,
"description" : "docFreq",
"details" : [ ]
},
{
"value" : 211.0,
"description" : "docCount",
"details" : [ ]
}
]
}
]
},
{
"value" : 1.0,
"description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from:",
"details" : [
{
"value" : 1.0,
"description" : "phraseFreq=1.0",
"details" : [ ]
},
{
"value" : 0.0,
"description" : "parameter k1",
"details" : [ ]
},
{
"value" : 0.0,
"description" : "parameter b (norms omitted for field)",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 1.1640041,
"description" : """weight(Line1:"eagle pointe" in 148) [PerFieldSimilarity], result of:""",
"details" : [
{
"value" : 1.1640041,
"description" : "score(doc=148,freq=1.0 = phraseFreq=1.0\n), product of:",
"details" : [
{
"value" : 3.0,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.38800138,
"description" : "idf(), sum of:",
"details" : [
{
"value" : 0.18813552,
"description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details" : [
{
"value" : 171.0,
"description" : "docFreq",
"details" : [ ]
},
{
"value" : 206.0,
"description" : "docCount",
"details" : [ ]
}
]
},
{
"value" : 0.19986586,
"description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details" : [
{
"value" : 169.0,
"description" : "docFreq",
"details" : [ ]
},
{
"value" : 206.0,
"description" : "docCount",
"details" : [ ]
}
]
}
]
},
{
"value" : 1.0,
"description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from:",
"details" : [
{
"value" : 1.0,
"description" : "phraseFreq=1.0",
"details" : [ ]
},
{
"value" : 0.0,
"description" : "parameter k1",
"details" : [ ]
},
{
"value" : 0.0,
"description" : "parameter b (norms omitted for field)",
"details" : [ ]
}
]
}
]
}
]
}
最佳答案
它应取决于评分模型(参见Similarity)的定义方式,可以基于每个索引或每个字段设置相似性算法。
Elasticsearch allows you to configure a scoring algorithm or similarity per field. The similarity setting provides a simple way of choosing a similarity algorithm other than the default BM25, such as TF/IDF.
weight(<field>:"eagle pointe" in 48) [PerFieldSimilarity]
docFreq
似乎仅限于该字段中包含该术语的文档数量。但是,我没有找到关于此的任何扩展信息,也不确定背后的逻辑,因为它应该取决于类相似性定义本身,而不取决于在特定字段上设置自定义对象的事实。
docFreq
是否在各个字段之间保持一致(这可能是一个错误)。
关于elasticsearch - Elasticsearch:如何计算docFreq,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57024432/
我正在使用 Lucene 3.1 来索引一些文档。 当我使用 IndexSearcher.search() 时,我成功地获得了查询结果。 但是,当我使用 IndexSearcher.doqFreq()
我目前正在研究 Lucenes MoreLikeThis 的修改版本,以适应我自己的目的。 有一件事我还是不明白。 在创建队列时,MoreLikeThis 会搜索该术语的 docFreq 最高的字段。
我是一名优秀的程序员,十分优秀!