gpt4 book ai didi

python - 使用字典中的值计算对数似然比

转载 作者:塔克拉玛干 更新时间:2023-11-03 05:02:13 25 4
gpt4 key购买 nike

首先,我使用以下代码从文件中提取了一些文本:

from collections import Counter

def n_gram_opcodes(source, n):
source = open(source).read()
OPCODES = set(["add","call","cmp","mov","jnz","jmp","jz","lea","pop","push",
"retn","sub","test","xor"])

source_words = source.split()
opcodes = [w for w in source_words if w in OPCODES]

return Counter(zip(*[opcodes[i:] for i in range(n)]))

该代码还将允许计算文件中某些单词的出现频率。以字典格式存储单词,如下所示:

Counter({('mov', 'mov', 'mov'): 18, ('xor', 'mov', 'mov'): 6, ('mov', 'mov', 'pop'): 3, ('mov', 'mov', 'push'): 3, ('pop', 'mov', 'mov'): 3, ('mov', 'call', 'cmp'): 3, ('push', 'pop', 'mov'): 3, ('mov', 'add', 'mov'): 3, ('call', 'mov', 'call'): 3, ('mov', 'mov', 'xor'): 3, ('cmp', 'mov', 'cmp'): 2, ('pop', 'mov', 'add'): 2, ('mov', 'pop', 'mov'): 2, ('mov', 'cmp', 'sub'): 2, ('mov', 'mov', 'sub'): 2, ('mov', 'mov', 'call'): 2})

使用上面的字典,我想取值(出现频率)并用在下面的对数似然公式中。我的问题是如何修改代码,以便它可以从任何字典(如上面的字典)中获取值并将其与下面的代码一起使用。最终结果应返回数字并使用 matplotlib 绘制图形。

import math
# The placeholder value for 0 counts
epsilon = 0.0001
def opcode_llr(opcode, freq_table_before, freq_table_after):

'''
Args:
opcode: A single opcode mnemonic, e.g., 'mov'

freq_table_before: The frequency table for opcode trigrams *before*
extraction.

freq_table_after: The frequency table for opcode trigrams *after*
extraction.

The keys for both tables are tuples of string. So, each is of the form

{
('mov', 'mov', 'mov'): 5.0,
('mov', 'jmp', 'mov'): 7.0,
...
}

'''
t_b = len(freq_table_before) or epsilon
t_a = len(freq_table_after) or epsilon

# Compute the opcode counts when occurring in positions 0, 1, 2
opcode_counts = [epsilon, epsilon, epsilon]
for triplet in freq_table_after.keys():
for i, comp in enumerate(triplet):
if comp == opcode:
opcode_counts[i] += 1

f1 = opcode_counts[0]
f2 = opcode_counts[1]
f3 = opcode_counts[2]

return (f1 + f2 + f3) * math.log(float(t_b) / t_a)

最佳答案

这是从 Counter 计算 LLR 的通用方法。

from collections import Counter
import random
import math

def CntToLLR(cnt):
n = sum(cnt.values()) # total number of samples
LLR = {} # dict to store LLRs (same keys as counter)
for x,y in cnt.items(): # x is the key, and y the count
LLR[x] = math.log(y) - math.log(n - y)
return LLR

# populate a counter with random values
cnt = Counter([random.randrange(10) for x in range(100)])

llrs = CntToLLR(cnt)

# You can convert the dictionary to a list of (key, value)
llrs = list(llrs.iteritems())

关于python - 使用字典中的值计算对数似然比,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32845730/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com