gpt4 book ai didi

nlp - 什么是自然语言处理中的区域哈希?

转载 作者:行者123 更新时间:2023-12-04 18:27:14 24 4
gpt4 key购买 nike

NLP 领域中有人听说过区域哈希这个词吗?据我所知,区域散列是遍历文档并提取句子的过程。然后对累积的句子进行哈希处理,然后对接下来的 n 个句子继续该过程...

我在 Google 上没有找到任何对此的引用,所以我想知道它是否有不同的名称。它应该与测量文本相似度/接近度有关。

也许它指的是局部敏感哈希?

最佳答案

据我所知,“区域哈希”在作为一门学科的 NLP 中并不是一个成熟的概念。它只是一些算法(与 NLP 相关)中使用的一个简单概念。我所知道的唯一使用它的是 Sphinx 搜索服务器,在这里,“区域散列”只是“称为区域的对象的散列”,其中“区域”描述如下:

Zones can be formally defined as follows. Everything between an opening and a matching closing tag is called a span, and the aggregate of all spans corresponding sharing the same tag name is called a zone. For instance, everything between the occurrences of < H1 > and < /H1 > in the document field belongs to H1 zone.

Zone indexing, enabled by index_zones directive, is an optional extension of the HTML stripper. So it will also require that the stripper is enabled (with html_strip = 1). The value of the index_zones should be a comma-separated list of those tag names and wildcards (ending with a star) that should be indexed as zones.

Zones can nest and overlap arbitrarily. The only requirement is that every opening tag has a matching tag. You can also have an arbitrary number of both zones (as in unique zone names, such as H1) and spans (all the occurrences of those H1 tags) in a document. Once indexed, zones can then be used for matching with the ZONE operator, see Section 5.3, “Extended query syntax”.

这些结构的散列在传统意义上用于加速搜索和查找。我不知道有任何“更深层次”的含义。

Perhaps it refers to locality sensitive hashing?

局部敏感散列是一种用于多维数据的概率方法,我没有看到与区域散列有任何更深层次的联系,事实上两者都使用散列函数。

关于nlp - 什么是自然语言处理中的区域哈希?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18473958/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com