Tensorflow 词汇处理器-6ren

Tensorflow 词汇处理器

转载作者：行者123 更新时间：2023-12-03 14:40:38

26

4

我正在关注使用 tensorflow 进行文本分类的 wildml 博客。我无法理解代码语句中 max_document_length 的用途:

vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)

另外我如何从 vocab_processor 中提取词汇

最佳答案

我已经想出了如何从词汇处理器对象中提取词汇。这对我来说非常有效。

import numpy as np
from tensorflow.contrib import learn

x_text = ['This is a cat','This must be boy', 'This is a a dog']
max_document_length = max([len(x.split(" ")) for x in x_text])

## Create the vocabularyprocessor object, setting the max lengh of the documents.
vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)

## Transform the documents using the vocabulary.
x = np.array(list(vocab_processor.fit_transform(x_text)))    

## Extract word:id mapping from the object.
vocab_dict = vocab_processor.vocabulary_._mapping

## Sort the vocabulary dictionary on the basis of values(id).
## Both statements perform same task.
#sorted_vocab = sorted(vocab_dict.items(), key=operator.itemgetter(1))
sorted_vocab = sorted(vocab_dict.items(), key = lambda x : x[1])

## Treat the id's as index into list and create a list of words in the ascending order of id's
## word with id i goes at index i of the list.
vocabulary = list(list(zip(*sorted_vocab))[0])

print(vocabulary)
print(x)

关于Tensorflow 词汇处理器，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40661684/

26

4

0

文章推荐： google-cloud-storage - gsutil 返回 "no matches found"

文章推荐： amazon-s3 - 将数据从Google Cloud Storage导出到Amazon S3

c# - 字典 API(词汇)
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。我们不允许提问寻求书籍、工具、软件库等的推荐。您可以编辑问题，以便用事实和引用来回答。关闭 4 年前。
semantic-web - 了解要使用的 RDFA 词汇
我们如何知道使用哪个词汇/命名空间来描述带有 RDFa 的数据？我看过很多使用 xmlns:dcterms="http://purl.org/dc/terms/" 的例子或 xmlns:sioc="
huggingface-transformers - 理解 BERT 词汇 [unusedxxx] token :
我正在尝试理解 BERT 词汇 here .它有 1000 个 [unusedxxx] token 。我不遵循这些 token 的用法。我了解其他特殊 token ，如 [SEP]、[CLS]，但 [
Oracle 词汇，什么是 mysql/SQL Server 相当于数据库
我需要一些词汇方面的帮助，我不经常使用 Oracle，但我熟悉 MySQL 和 SQL Server。我有一个应用程序需要升级和迁移，执行此操作的部分过程涉及导出到 XML 文件，允许安装程序创建新
ruby - 解析 RDFa、微数据等的最佳方式是什么，使用统一的模式/词汇(例如 schema.org)存储和显示信息
我主要使用 Ruby 来执行此操作，但到目前为止我的攻击计划如下: 使用 gems rdf、rdf-rdfa 和 rdf-microdata 或 mida 来解析给定任何 URI 的数据。我认为最好映

首页

博学

6Ren·AI

商城

Tensorflow 词汇处理器