python - Pandas NLTK 标记 "unhashable type: ' 列表'"-6ren

python - Pandas NLTK 标记 "unhashable type: ' 列表'"

转载作者：太空宇宙更新时间：2023-11-03 16:19:15

24

4

以下示例:Twitter data mining with Python and Gephi: Case synthetic biology

CSV 至:df['Country', 'Responses']

'Country'
Italy
Italy
France
Germany

'Responses' 
"Loren ipsum..."
"Loren ipsum..."
"Loren ipsum..."
"Loren ipsum..."

对“响应”中的文本进行标记
删除 100 个最常见的单词(基于 Brown.corpus)
找出剩下的 100 个最常见的单词

我可以完成步骤 1 和 2，但在步骤 3 中出现错误:

TypeError: unhashable type: 'list'

我相信这是因为我正在数据框中工作并进行了此(可能是错误的)修改:

原始示例:

#divide to words
tokenizer = RegexpTokenizer(r'\w+')
words = tokenizer.tokenize(tweets)

我的代码:

#divide to words
tokenizer = RegexpTokenizer(r'\w+')
df['tokenized_sents'] = df['Responses'].apply(nltk.word_tokenize)

我的完整代码:

df = pd.read_csv('CountryResponses.csv', encoding='utf-8', skiprows=0, error_bad_lines=False)

tokenizer = RegexpTokenizer(r'\w+')
df['tokenized_sents'] = df['Responses'].apply(nltk.word_tokenize)

words =  df['tokenized_sents']

#remove 100 most common words based on Brown corpus
fdist = FreqDist(brown.words())
mostcommon = fdist.most_common(100)
mclist = []
for i in range(len(mostcommon)):
    mclist.append(mostcommon[i][0])
words = [w for w in words if w not in mclist]

Out: ['the',
 ',',
 '.',
 'of',
 'and',
...]

#keep only most common words
fdist = FreqDist(words)
mostcommon = fdist.most_common(100)
mclist = []
for i in range(len(mostcommon)):
    mclist.append(mostcommon[i][0])
words = [w for w in words if w not in mclist]

TypeError: unhashable type: 'list'

关于不可哈希列表有很多问题，但我认为没有一个问题是完全相同的。有什么建议么？谢谢。

<小时/>

回溯

TypeError                                 Traceback (most recent call last)
<ipython-input-164-a0d17b850b10> in <module>()
  1 #keep only most common words
----> 2 fdist = FreqDist(words)
  3 mostcommon = fdist.most_common(100)
  4 mclist = []
  5 for i in range(len(mostcommon)):

/home/*******/anaconda3/envs/*******/lib/python3.5/site-packages/nltk/probability.py in __init__(self, samples)
    104         :type samples: Sequence
    105         """
--> 106         Counter.__init__(self, samples)
    107 
    108     def N(self):

/home/******/anaconda3/envs/******/lib/python3.5/collections/__init__.py in __init__(*args, **kwds)
    521             raise TypeError('expected at most 1 arguments, got %d' % len(args))
    522         super(Counter, self).__init__()
--> 523         self.update(*args, **kwds)
    524 
    525     def __missing__(self, key):

/home/******/anaconda3/envs/******/lib/python3.5/collections/__init__.py in update(*args, **kwds)
    608                     super(Counter, self).update(iterable) # fast path when counter is empty
    609             else:
--> 610                 _count_elements(self, iterable)
    611         if kwds:
    612             self.update(kwds)

TypeError: unhashable type: 'list'

最佳答案

FreqDist函数接受可迭代的可哈希对象(制成字符串，但它可能适用于任何对象)。您收到的错误是因为您传递了一个可迭代的列表。正如您所建议的，这是因为您所做的更改:

df['tokenized_sents'] = df['Responses'].apply(nltk.word_tokenize)

如果我理解Pandas apply function documentation正确的是，该行正在应用 nltk.word_tokenize功能到某些系列。 word-tokenize返回单词列表。

作为解决方案，只需在尝试应用 FreqDist 之前将列表添加在一起即可。，像这样:

allWords = []
for wordList in words:
    allWords += wordList
FreqDist(allWords)

更完整的修订，可以满足您的需求。如果您需要的只是识别第二组 100，请注意 mclist将有第二次。

df = pd.read_csv('CountryResponses.csv', encoding='utf-8', skiprows=0, error_bad_lines=False)

tokenizer = RegexpTokenizer(r'\w+')
df['tokenized_sents'] = df['Responses'].apply(nltk.word_tokenize)

lists =  df['tokenized_sents']
words = []
for wordList in lists:
    words += wordList

#remove 100 most common words based on Brown corpus
fdist = FreqDist(brown.words())
mostcommon = fdist.most_common(100)
mclist = []
for i in range(len(mostcommon)):
    mclist.append(mostcommon[i][0])
words = [w for w in words if w not in mclist]

Out: ['the',
 ',',
 '.',
 'of',
 'and',
...]

#keep only most common words
fdist = FreqDist(words)
mostcommon = fdist.most_common(100)
mclist = []
for i in range(len(mostcommon)):
    mclist.append(mostcommon[i][0])
# mclist contains second-most common set of 100 words
words = [w for w in words if w in mclist]
# this will keep ALL occurrences of the words in mclist

关于python - Pandas NLTK 标记 "unhashable type: ' 列表'"，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38666973/

24

4

0

文章推荐： ruby-on-rails - 如何在 Ruby on Rails 中创建批量操作事务

文章推荐： ruby-on-rails - Markdown 外部图像链接与 Redcarpet

文章推荐： python - 使用 matplotlib 对 Latex 进行像素渲染

PyTorch:type(a)、a.type、a.type() 之间的区别
假设a是张量，那么有什么区别: 类型(a) a.类型 a.type() 我找不到区分这些的文档。最佳答案 type 是 python 内置方法。它将返回对象的类型。喜欢 torch.Tensor.
dependent-type - `Type 1` 既不是 `Type` 也不是 `Type` 的居民的示例
什么是 Type 1 的居民的例子？两者都不是 Type也不是Type的居民?在 Idris REPL 中进行探索时，我无法想出任何东西。更准确地说，我正在寻找一些 x除了 Type产生以下结果:
abap - 什么是 : TYPE, TYPES、TYPE-POOL、TYPE-POOLS 和类型组？
我找到了一些资源，但我不确定我是否理解。我找到的一些资源是: http://help.sap.com/saphelp_nw70/helpdata/en/fc/eb2ff3358411d1829f00
c++ - 函数指针的 Type(f)(Type) 和 Type(*f)(Type) 之间的区别？
这两个函数原型(prototype)有什么区别？ void apply1(double(f)(double)); void apply2(double(*f)(double)); 如果目标是将提供的函
types - 去戈兰 : Type assertion on customized type
http://play.golang.org/p/icQO_bAZNE 我正在练习使用堆进行排序，但是 prog.go:85: type bucket is not an expression
Replace Generic Types In `System.Type[]` With Types(将`System.Type[]`中的泛型类型替换为类型)
假设有一个泛型定义的方法信息对象，即一个方法信息对象，这样的方法Info.IsGenericMethodDefinition==TRUE：。也可以说它们也有一个泛型参数列表：。我可以使用以下命令获取该
dependent-type - 在依赖类型的编程语言中，Type-in-Type 是否适用于编程？
在具有依赖类型的语言中，您可以使用 Type-in-Type 来简化语言并赋予它很多功能。这使得语言在逻辑上不一致，但如果您只对编程感兴趣而不对定理证明感兴趣，这可能不是问题。在 Cayenne
types - "static type"和 "dynamic type"怎么可能不同？
根据 Nim 手册，变量类型是“静态类型”，而变量在内存中指向的实际值是“动态类型”。它们怎么可能是不同的类型？我认为将错误的类型分配给变量将是一个错误。最佳答案 import typetrait
Swift 结构扩展 : 'Cannot convert return expression of type to return type '
假设您有以下结构和协议(protocol): struct Ticket { var items: [TicketItem] = [] } struct TicketItem { } prot
c# - 什么可能导致 Entity Framework 抛出消息为 "(some type) is neither a super-type nor a sub-type of (some other type)"的异常？
我正在处理一个 EF 问题，我发现它很难调试...以前，在我的系统中有一个表类型继承设置管理不同的用户类型 - 所有用户共有的一种根类型，以及大致基于使用该帐户的人员类型的几种不同的子类型。现在，我遇
ios - Realm iOS : Cannot Convert value of type 'Dogs.Type' to expected argument type 'T.Type'
这是我的 DBManager.swift import RealmSwift class DBManager { class func getAllDogs() -> [Dog] {
python - (215 :Assertion failed) type == CV_32FC1 || type == CV_32FC2 || type == CV_64FC1 || type == CV_64FC2 in function 'dft'
我正在尝试使用傅里叶校正图像中的曝光。这是我面临的错误 5 padded = np.log(padded + 1) #so we never have log of 0 6 g
c# - : The mapping of CLR type to EDM type is ambiguous because multiple CLR types match the EDM type 的建议
关闭。这个问题是opinion-based .它目前不接受答案。想要改进这个问题？更新问题，以便 editing this post 可以用事实和引用来回答它. 关闭 9 年前。 Improve
Swift 泛型错误 : Cannot convert value of type 'Type' to expected argument type 'Type<_>'
请考虑以下设置: protocol MyProcotol { } class MyModel: MyProcotol { } enum Result { case success(value:
python - 类型错误 : type 'types.GenericAlias' is not an acceptable base type
好吧，我将我的 python 项目编译成一个可执行文件，它在我的电脑上运行，但我将它发送给几个 friend 进行测试，他们都遇到了这个错误。我以前从未见过这样的错误。我使用 Nuitka 来编译代码
python - 值错误 : Type must be a sub-type of ndarray type
当我尝试训练我的模型时"ValueError: Type must be a sub-type of ndarray type"出现在 line x_norm=(np.power(x,2)).sum(
swift - 静态 Var 闭包返回 Type.Type 而不是 Type
我尝试在另一个类中打断、计数然后加入对象。所以我构建协议(protocol): typealias DataBreaker = () -> [Double] typealias DataJoiner
angular - npm types 或 typings 或 @type 或什么？
我正在使用 VS 2015 更新 3、Angular 2.1.2、Typescript 2.0.6 有人可以澄清什么是 typings 与 npm @types 以及本月很难找到的任何其他文档吗？或
与 bool Type.op_Equality (Type, Type) 的 Mono 兼容性
我正在考虑从 VS2010 更改为 Mono，因此我通过 MoMA 运行我的程序集，看看我在转换过程中可能遇到多少困难。在生成的报告中，我发现我不断收到此错误: bool Type.op_Equali
reactjs - typescript 如何混合动态([key : type]: type) and static typing for an interface
主要问题不太确定这是否可能，但由于我讨厌 Typescript 并且它使我的编码变得困难，我想我会问只是为了确定。 interface ISomeInterface { handler: ()

首页

博学

6Ren·AI

商城

python - Pandas NLTK 标记 "unhashable type: ' 列表'"