gpt4 book ai didi

python - 如何为隐马尔可夫模型找到最可能的隐藏状态序列

转载 作者:太空狗 更新时间:2023-10-29 18:12:37 24 4
gpt4 key购买 nike

Viterbi algorithm在隐马尔可夫模型中找到最可能的隐藏状态序列。我目前正在使用 hhquark 提供的以下精彩代码.

import numpy as np


def viterbi_path(prior, transmat, obslik, scaled=True, ret_loglik=False):
'''Finds the most-probable (Viterbi) path through the HMM state trellis
Notation:
Z[t] := Observation at time t
Q[t] := Hidden state at time t
Inputs:
prior: np.array(num_hid)
prior[i] := Pr(Q[0] == i)
transmat: np.ndarray((num_hid,num_hid))
transmat[i,j] := Pr(Q[t+1] == j | Q[t] == i)
obslik: np.ndarray((num_hid,num_obs))
obslik[i,t] := Pr(Z[t] | Q[t] == i)
scaled: bool
whether or not to normalize the probability trellis along the way
doing so prevents underflow by repeated multiplications of probabilities
ret_loglik: bool
whether or not to return the log-likelihood of the best path
Outputs:
path: np.array(num_obs)
path[t] := Q[t]
'''
num_hid = obslik.shape[0] # number of hidden states
num_obs = obslik.shape[1] # number of observations (not observation *states*)

# trellis_prob[i,t] := Pr((best sequence of length t-1 goes to state i), Z[1:(t+1)])
trellis_prob = np.zeros((num_hid,num_obs))
# trellis_state[i,t] := best predecessor state given that we ended up in state i at t
trellis_state = np.zeros((num_hid,num_obs), dtype=int) # int because its elements will be used as indicies
path = np.zeros(num_obs, dtype=int) # int because its elements will be used as indicies

trellis_prob[:,0] = prior * obslik[:,0] # element-wise mult
if scaled:
scale = np.ones(num_obs) # only instantiated if necessary to save memory
scale[0] = 1.0 / np.sum(trellis_prob[:,0])
trellis_prob[:,0] *= scale[0]

trellis_state[:,0] = 0 # arbitrary value since t == 0 has no predecessor
for t in xrange(1, num_obs):
for j in xrange(num_hid):
trans_probs = trellis_prob[:,t-1] * transmat[:,j] # element-wise mult
trellis_state[j,t] = trans_probs.argmax()
trellis_prob[j,t] = trans_probs[trellis_state[j,t]] # max of trans_probs
trellis_prob[j,t] *= obslik[j,t]
if scaled:
scale[t] = 1.0 / np.sum(trellis_prob[:,t])
trellis_prob[:,t] *= scale[t]

path[-1] = trellis_prob[:,-1].argmax()
for t in range(num_obs-2, -1, -1):
path[t] = trellis_state[(path[t+1]), t+1]

if not ret_loglik:
return path
else:
if scaled:
loglik = -np.sum(np.log(scale))
else:
p = trellis_prob[path[-1],-1]
loglik = np.log(p)
return path, loglik


if __name__=='__main__':
# Assume there are 3 observation states, 2 hidden states, and 5 observations
priors = np.array([0.5, 0.5])
transmat = np.array([
[0.75, 0.25],
[0.32, 0.68]])
emmat = np.array([
[0.8, 0.1, 0.1],
[0.1, 0.2, 0.7]])
observations = np.array([0, 1, 2, 1, 0], dtype=int)
obslik = np.array([emmat[:,z] for z in observations]).T
print viterbi_path(priors, transmat, obslik) #=> [0 1 1 1 0]
print viterbi_path(priors, transmat, obslik, scaled=False) #=> [0 1 1 1 0]
print viterbi_path(priors, transmat, obslik, ret_loglik=True) #=> (array([0, 1, 1, 1, 0]), -7.776472586614755)
print viterbi_path(priors, transmat, obslik, scaled=False, ret_loglik=True) #=> (array([0, 1, 1, 1, 0]), -8.0120386579275227)

然而,我真正需要的不仅仅是最可能的序列,而是隐藏状态的前k个最可能的序列。

How can this code be modified to give the top k most likely sequences?

最佳答案

从另一个角度来看,Viterbi 算法计算非循环加权图中的最短路径,其节点是(隐藏状态,时间)对。您可以使用 Yen 的算法找到前 k 个最短路径,这些路径转换为前 k 个最有可能的序列。这是 Yen 算法在 NetworkX 中的一个实现。 .

要设置图表,我们从源节点和汇节点开始。对于所有状态 i,使用权重 log(prior[i] * obslik[i, 0]) 构建从源节点到节点 (i, 0) 的弧。对于所有状态 i、所有状态 j 和所有时间 t > 0,从节点 (i, t-1) 到 (j, t) 制作弧,权重为 log(transmat[i, j] * obslik[j, t] ).设T为最后一次,从(i,T)到sink的权重为0的弧。从source到sink的每条路径都与一系列隐藏状态一一对应,并且长度为路径是该序列的对数似然。

关于python - 如何为隐马尔可夫模型找到最可能的隐藏状态序列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57238883/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com