gpt4 book ai didi

dictionary - 为什么这个 haskey() 条件总是 false?

转载 作者:行者123 更新时间:2023-12-04 00:59:17 25 4
gpt4 key购买 nike

我从字数列表开始:

julia> import Iterators: partition
julia> import StatsBase: countmap
julia> s = split("the lazy fox jumps over the brown dog");
julia> vocab_counter = countmap(s)
Dict{SubString{String},Int64} with 7 entries:
"brown" => 1
"lazy" => 1
"jumps" => 1
"the" => 2
"fox" => 1
"over" => 1
"dog" => 1

然后我想计算编号。每个单词的 ngram 并将其存储在嵌套字典中。外部键是 ngram,内部键是单词,最里面的值是给定单词的 ngram 的计数。

我已经尝试过:

ngram_word_counter = Dict{Tuple,Dict}()
for (word, count) in vocab_counter
for ng in ngram(word, 2) # bigrams.
if ! haskey(ngram_word_counter, ng)
ngram_word_counter[ng] = Dict{String,Int64}()
ngram_word_counter[ng][word] = 0
end
ngram_word_counter[ng][word] += 1
end
end

这给了我所需的数据结构:

julia> ngram_word_counter
Dict{Tuple,Dict} with 20 entries:
('b','r') => Dict("brown"=>1)
('t','h') => Dict("the"=>1)
('o','w') => Dict("brown"=>1)
('z','y') => Dict("lazy"=>1)
('o','g') => Dict("dog"=>1)
('u','m') => Dict("jumps"=>1)
('o','x') => Dict("fox"=>1)
('e','r') => Dict("over"=>1)
('a','z') => Dict("lazy"=>1)
('p','s') => Dict("jumps"=>1)
('h','e') => Dict("the"=>1)
('d','o') => Dict("dog"=>1)
('w','n') => Dict("brown"=>1)
('m','p') => Dict("jumps"=>1)
('l','a') => Dict("lazy"=>1)
('o','v') => Dict("over"=>1)
('v','e') => Dict("over"=>1)
('r','o') => Dict("brown"=>1)
('f','o') => Dict("fox"=>1)
('j','u') => Dict("jumps"=>1)

但请注意,这些值是错误的:

('t','h') => Dict("the"=>1)
('h','e') => Dict("the"=>1)

应该是:

('t','h') => Dict("the"=>2)
('h','e') => Dict("the"=>2)

由于“the”这个词出现了两次。

仔细一看,似乎 haskey(ngram_word_counter, ng) 总是 false =(

julia> ngram_word_counter = Dict{Tuple,Dict}()
for (word, count) in vocab_counter
for ng in ngram(word, 2) # bigrams.
println(haskey(ngram_word_counter, ng))
end
end

[输出]:

false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false
false

为什么这个 haskey() 条件总是 false?

最佳答案

TL;DR:应该是 ngram_word_counter[ng][word] += count 而不是 ngram_word_counter[ng][word] += 1

只添加1会忽略一个单词多次出现的多重贡献。单词出现的次数被编码为 vocab_counter 值,这些值进入 for 循环中的变量 count 中。因此增量应该是count

后期的调试检查无效,而且通常情况下,调试代码的错误会混淆问题。预期的检查可能是:

julia> ngram_word_counter = Dict{Tuple,Dict}()
for (word, count) in vocab_counter
for ng in ngram(word, 2) # bigrams.
println(haskey(ngram_word_counter, ng))
ngram_word_counter[ng] = 1
end
end

关于dictionary - 为什么这个 haskey() 条件总是 false?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43115581/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com