gpt4 book ai didi

python - 如何在 Python 中访问由嵌套 defaultdict 创建的矩阵的一行?

转载 作者:太空宇宙 更新时间:2023-11-03 17:53:02 28 4
gpt4 key购买 nike

我正在 Python 中创建一个单词共现矩阵,并使用嵌套的 defaultdicts 来创建该矩阵。我已经成功创建了矩阵并存储了字数,但是现在在尝试从嵌套的默认字典中获取向量(矩阵行)时遇到了麻烦。

这是我用来初始化矩阵的代码行:

matrix = collections.defaultdict(lambda: collections.defaultdict(int))

以下是我用来将字数统计放入矩阵中的行:

matrix[target_word_id][collocated_word_id] += 1

matrix[collocated_word_id][target_word_id] += 1

这就是我尝试访问矩阵中与给定单词 id 对应的行的方式:

vector1 = matrix[word1_id]

当我打印 vector1 来测试我的工作时,这就是我得到的输出:

defaultdict(<class 'int'>, {})

该类的完整代码在这里。我从一个单独的主类调用这些函数:

class Create_vector():

def build_vocab(self, corpus):
vocab = collections.defaultdict(int)
i = 1

for line in corpus:
token = line.strip()
if token not in vocab:
vocab[token] = i
i += 1

return vocab



def build_cooccurrence(self, corpus, vocab, window):

matrix = collections.defaultdict(lambda: collections.defaultdict(int))

for x, line in enumerate(corpus):

if x % 100000 == 0:
print('Building cooccurrence matrix: on line %i', x)
tokens = line.strip()
token_ids = [vocab[token] for token in tokens]

for i, target_word_id in enumerate(token_ids):

collocated_word_ids = token_ids[min(0, target_word_id - window): target_word_id]

for j, collocated_word_id in enumerate(collocated_word_ids):

matrix[target_word_id][collocated_word_id] += 1

matrix[collocated_word_id][target_word_id] += 1

return matrix


def get_vector(self, matrix, vocab, weight, word1, word2):

if weight == 'FREQ':

if word1 in vocab:
word1_id = vocab[word1]
vector1 = matrix[word1_id]
pprint.pprint(vector1)

主类在这里:

import nltk
import sys
from nltk.corpus import stopwords
import create_vector
import pprint
import string


def main():
brown_words = list(nltk.corpus.brown.words())
window = int(sys.argv[1])
weight = sys.argv[2]
brown_words_lower = [word.lower() for word in brown_words]
brown_words_only = [w for w in brown_words_lower if w not in string.punctuation]
stops = set(stopwords.words('english'))
brown_words_filtered = [w for w in brown_words_only if w not in stops]

vector = create_vector.Create_vector()

vocab = vector.build_vocab(brown_words_filtered)
cooccurrence = vector.build_cooccurrence(brown_words_filtered, vocab, window)

for line in text:
words = line.split(',')
word1 = words[0]
word2 = words[1]
vector1, vector2 = vector.get_vector(cooccurrence, vocab, weight, word1, word2)

运行它的命令是:python3.4 main.py 2 频率

最佳答案

无法重现:

>>> import collections
>>> matrix = collections.defaultdict(lambda: collections.defaultdict(int))
>>> matrix[2][3] += 1
>>> matrix[3][2] += 1
>>> vector1 = matrix[2]
>>> vector1
defaultdict(<type 'int'>, {3: 1})

您确定 word1_id 中的值等于已插入矩阵中的值吗?您能否发布完整的代码,而不仅仅是其中的片段?

关于python - 如何在 Python 中访问由嵌套 defaultdict 创建的矩阵的一行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28849697/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com