gpt4 book ai didi

Tensorflow 词汇处理器

转载 作者:行者123 更新时间:2023-12-03 14:40:38 26 4
gpt4 key购买 nike

我正在关注使用 tensorflow 进行文本分类的 wildml 博客。我无法理解代码语句中 max_document_length 的用途:

vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)

另外我如何从 vocab_processor 中提取词汇

最佳答案

我已经想出了如何从词汇处理器对象中提取词汇。这对我来说非常有效。

import numpy as np
from tensorflow.contrib import learn

x_text = ['This is a cat','This must be boy', 'This is a a dog']
max_document_length = max([len(x.split(" ")) for x in x_text])

## Create the vocabularyprocessor object, setting the max lengh of the documents.
vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)

## Transform the documents using the vocabulary.
x = np.array(list(vocab_processor.fit_transform(x_text)))

## Extract word:id mapping from the object.
vocab_dict = vocab_processor.vocabulary_._mapping

## Sort the vocabulary dictionary on the basis of values(id).
## Both statements perform same task.
#sorted_vocab = sorted(vocab_dict.items(), key=operator.itemgetter(1))
sorted_vocab = sorted(vocab_dict.items(), key = lambda x : x[1])

## Treat the id's as index into list and create a list of words in the ascending order of id's
## word with id i goes at index i of the list.
vocabulary = list(list(zip(*sorted_vocab))[0])

print(vocabulary)
print(x)

关于Tensorflow 词汇处理器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40661684/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com