gpt4 book ai didi

hadoop - 配置单元 ngram 停用词列表?

转载 作者:可可西里 更新时间:2023-11-01 14:32:11 25 4
gpt4 key购买 nike

虽然被列为 example use cases 之一...我还没有找到从 Hive n-gram 中过滤掉垃圾词(and、or 等)的示例。

SELECT explode(context_ngrams(sentences(lower(description)), array("criminal", null), 10)) AS x FROM mapped_discussions;

{"ngram":["justice"],"estfrequency":274.0}
{"ngram":["behavior"],"estfrequency":121.0}
{"ngram":["law"],"estfrequency":92.0}
{"ngram":["activity"],"estfrequency":69.0}
{"ngram":["acts"],"estfrequency":41.0}
{"ngram":["procedure"],"estfrequency":35.0}
{"ngram":["and"],"estfrequency":29.0}
{"ngram":["or"],"estfrequency":27.0}
{"ngram":["case"],"estfrequency":26.0}
{"ngram":["cases"],"estfrequency":25.0}

有什么想法吗?谢谢!

最佳答案

这里有一篇关于这个主题的优秀文章。 http://bigdatabloggin.blogspot.com/2012/08/trending-topics-in-hive-ngrams.html

关于hadoop - 配置单元 ngram 停用词列表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11972932/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com