python - 调用 process.extract 时出现 TypeError : ('expected string or bytes-like object' , 'occurred at index 0' )-6ren

python - 调用 process.extract 时出现 TypeError : ('expected string or bytes-like object' , 'occurred at index 0' )

转载作者：行者123 更新时间：2023-12-01 07:30:26

当我尝试使用 fuzzywuzzy 中的 process.extract 时，收到以下错误消息pandas DataFrame 中的列上的库:

TypeError: ('expected string or bytes-like object', 'occurred at index 0')

背景

我有以下示例df:

from fuzzywuzzy import fuzz 
from fuzzywuzzy import process
import pandas as pd
import nltk 

name_list = ['John D Doe', 'Jane L Doe', 'Jack Doe']
text_list = [' Reason for Visit: John D Doe is a Jon has male pattern baldness',
       'Jane is related to John and Jan L Doe is his sister  ',
            'Jack Doe is thier son and jac is five']
df = pd.DataFrame(
    {'Names': name_list,
     'Text': text_list,
     'P_ID': [1,2,3]

    })
#tokenize
df['Token_Names'] = df.apply(lambda row: nltk.word_tokenize(row['Names']), axis=1)
df['Token_Text'] = df.apply(lambda row: nltk.word_tokenize(row['Text']), axis=1)

#df
    Names        Text                         P_ID  Token_Names     Token_Text
0   John D Doe  Reason for Visit: John D Doe    1   [John, D, Doe]  [Reason, for, Visit, :, John, D, Doe, is, a, J...
1   Jane L Doe  Jane is related to John         2   [Jane, L, Doe]  [Jane, is, related, to, John, and
2   Jack Doe    Jack Doe is thier son           3   [Jack, Doe]     [Jack, Doe, is, thier, son, and, jac, is, five]

问题

我创建了以下函数

def get_alt_names(token_name, token_text):
    if len(token_name) > 1:

          extract = process.extract(token_name,token_text, limit = 3, scorer = fuzz.ratio)
    return extract

我使用lambda和apply

 #use apply with extract
 df['Alt_Names'] = df.apply(lambda x: get_alt_names(x.Token_Names, x.Token_Text) , axis =1)

但我收到以下错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-12-6dcc99fa91b0> in <module>()
      1 #use apply with extract
----> 2 df['Alt_Names'] = df.apply(lambda x: get_alt_names(x.Token_Names, x.Token_Text) , axis =1)

C:\Anaconda\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
   6002                          args=args,
   6003                          kwds=kwds)
-> 6004         return op.get_result()
   6005 
   6006     def applymap(self, func):

C:\Anaconda\lib\site-packages\pandas\core\apply.py in get_result(self)
    140             return self.apply_raw()
    141 
--> 142         return self.apply_standard()
    143 
    144     def apply_empty_result(self):

C:\Anaconda\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    246 
    247         # compute the result using the series generator
--> 248         self.apply_series_generator()
    249 
    250         # wrap results

C:\Anaconda\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
    275             try:
    276                 for i, v in enumerate(series_gen):
--> 277                     results[i] = self.f(v)
    278                     keys.append(v.name)
    279             except Exception as e:

<ipython-input-12-6dcc99fa91b0> in <lambda>(x)
      1 #use apply with extract
----> 2 df['Alt_Names'] = df.apply(lambda x: get_alt_names(x.Token_Names, x.Token_Text) , axis =1)

<ipython-input-10-360a3b67e5d2> in get_alt_names(token_name, token_text)
      5     #if len(token_name) inside token_names_unlisted > 1:
      6     if len(token_name) > 1:
----> 7         extract = process.extract(token_name,token_text, limit = 3, scorer = fuzz.ratio)
      8         return extract

C:\Anaconda\lib\site-packages\fuzzywuzzy\process.py in extract(query, choices, processor, scorer, limit)
    166     """
    167     sl = extractWithoutOrder(query, choices, processor, scorer)
--> 168     return heapq.nlargest(limit, sl, key=lambda i: i[1]) if limit is not None else \
    169         sorted(sl, key=lambda i: i[1], reverse=True)
    170 

C:\Anaconda\lib\heapq.py in nlargest(n, iterable, key)
    567     # General case, slowest method
    568     it = iter(iterable)
--> 569     result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
    570     if not result:
    571         return result

C:\Anaconda\lib\heapq.py in <listcomp>(.0)
    567     # General case, slowest method
    568     it = iter(iterable)
--> 569     result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
    570     if not result:
    571         return result

C:\Anaconda\lib\site-packages\fuzzywuzzy\process.py in extractWithoutOrder(query, choices, processor, scorer, score_cutoff)
     76 
     77     # Run the processor on the input query.
---> 78     processed_query = processor(query)
     79 
     80     if len(processed_query) == 0:

C:\Anaconda\lib\site-packages\fuzzywuzzy\utils.py in full_process(s, force_ascii)
     93         s = asciidammit(s)
     94     # Keep only Letters and Numbers (see Unicode docs).
---> 95     string_out = StringProcessor.replace_non_letters_non_numbers_with_whitespace(s)
     96     # Force into lowercase.
     97     string_out = StringProcessor.to_lower_case(string_out)

C:\Anaconda\lib\site-packages\fuzzywuzzy\string_processing.py in replace_non_letters_non_numbers_with_whitespace(cls, a_string)
     24         numbers with a single white space.
     25         """
---> 26         return cls.regex.sub(" ", a_string)
     27 
     28     strip = staticmethod(string.strip)

TypeError: ('expected string or bytes-like object', 'occurred at index 0')

我认为这是因为我的输入是一个列表

所需输出

我希望输出看起来像下面这样(也许是列表的列表？)

 Other_Columns_Here    Alt_Names
0                 [('John', 100), ('Jon', 86), ('Reason', 40)][('D', 100), ('Doe', 50), ('baldness', 22)][('Doe', 100), ('D', 50), ('baldness', 36)]
1                 [('Jane', 100), ('Jan', 86), ('and', 57)] [('L', 100), ('related', 25), ('Jane', 0)][('Doe', 100), ('to', 40), ('and', 33)]
2                 [('Doe', 100), ('to', 40), ('and', 33)] [('Doe', 100), ('son', 33), ('and', 33)]

问题

如何修复我的错误？

最佳答案

我认为您需要更改 get_alt_names 使其看起来更像以下版本:

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import pandas as pd
import nltk

name_list = ['John D Doe', 'Jane L Doe', 'Jack Doe']
text_list = [
    'Reason for Visit: John D Doe is a Jon has male pattern baldness',
    'Jane is related to John and Jan L Doe is his sister  ',
    'Jack Doe is their son and jac is five'
]
df = pd.DataFrame({
        'Names': name_list,
        'Text': text_list,
        'P_ID': [1,2,3]
    })

df['Token_Names'] = df.apply(lambda row: nltk.word_tokenize(row['Names']), axis=1)
df['Token_Text'] = df.apply(lambda row: nltk.word_tokenize(row['Text']), axis=1)

def get_alt_names(s):
    token_names = s['Token_Names']
    token_text = s['Token_Text']
    extract = list()
    for name in token_names:
        if len(name) > 1:
            result = process.extract(name, token_text, limit=3, scorer=fuzz.ratio)
            extract.append(result)
    return extract

df['Alt_Names'] = df.apply(get_alt_names, axis=1)

print(df)

输出

0    [[(John, 100), (Jon, 86), (Reason, 40)], [(Doe...
1    [[(Jane, 100), (Jan, 86), (and, 57)], [(Doe, 1...
2    [[(Jack, 100), (jac, 86), (and, 29)], [(Doe, 1...
Name: Alt_Names, dtype: object

此代码可以运行，但您可能仍然需要修改它才能获得您想要的确切结果。具体来说，我不确定您是否希望 'Alt_Names' 成为列表的列表或只是列表。

关于python - 调用 process.extract 时出现 TypeError : ('expected string or bytes-like object' , 'occurred at index 0' )，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57228272/

文章推荐： java - 子程序掷骰子分配

文章推荐：具有两个值的Java while语句哨兵控制循环不会终止

文章推荐： ssis - 平面文件源错误在SSIS中的输出连接

javascript - TypeError : Object [object Object], [object Object] 没有找到方法
我遵循了一本名为“Sitepoint Full Stack Javascript with MEAN”的书中的教程，我刚刚完成了第 6 章，应该已经创建了一个带有“数据库”的“服务器”。数据库只不过是
javascript - Ajax返回数组在PHP中显示[object Object],[object Object]
在 Jquery 中，我创建两个数组，一个嵌入另一个数组，就像这样...... arrayOne = [{name:'a',value:1}, {name:'b',value:2}] var arra
javascript - 为什么 ({}+{}) ="[object Object][object Object]"？
这个问题在这里已经有了答案: What is the explanation for these bizarre JavaScript behaviours mentioned in the 'Wa
angular - 无法解析...的所有参数([object Object]，[object Object]，？，？)
我被放在别人的代码上，有一个类用作其他组件的基础。当我尝试 ng serve --aot(或 build --prod)时，我得到以下信息。 @Component({ ...,
javascript - getJSON 只返回 [object Object],[object Object]
我正在测试一些代码，并使用数据创建了一个 json 文件。问题是我在警报中收到“[object Object],[object Object]”。没有数据。我做错了什么？这是代码:
javascript - print[ [object Object],[object Object]] 到json数组
我想打印 [object Object],[object Object] 以明智地 "[[{ 'x': '1', 'y': '0' }, { 'x': '2', 'y': '1' }]]"; 在 ja
javascript - Couchdb 列表仅返回 [object Object][object Object]
我有一个功能 View ，我正在尝试以特殊格式的方式输出。但我无法让列表功能正常工作。我得到的唯一返回是[object Object][object Object] [object Object]
javascript - TypeError Object[object object] 没有方法 SubSelf，TypeError Object[object object] 没有方法 intersectsPlane
在使用优秀的 Sim.js 和 Three.js 库处理 WebGL 项目时，我偶然发现了下一个问题: 一路走来，它使用了 THREE.Ray 的下一个构造函数: var ray = new THRE
javascript - React js 多选 [object Object], [object Object]
我正在使用 Material UI 进行多重选择。这是我的代码。 {listStates.map(col => (
javascript - jquery ajax [object Object] [object Object] 在列表中输出
我的代码使用ajax: $("#keyword").keyup(function() { var keyword = $("#keyword").val(); if (keyword.
angular - 无法解析 AuthenticationService : ([object Object], 的所有参数？，[object Object])
我遇到了下一个错误，无法理解如何解决它。 Can't resolve all parameters for AuthenticationService: ([object Object], ?, [o
Angular 10 FormArray ERROR 错误 : Cannot find control with name: '[object Object], [object Object],[object Object]
我正在尝试创建一个显示动态复选框的表单，至少应选中其中一个才能继续。我还需要获取一组选中的复选框。这是组件的代码: import { Component, OnInit } from '@angul
javascript - 为什么我在 UI 中没有收到验证 Flash 消息，我收到这样的 Flash 错误 [object Object],[object Object],[object Object]
我正在开发 NodeJs 应用程序，它是博客应用程序。我使用了快速验证器，我尝试在 UI 端使用快速闪存消息将帖子保存在数据库中之前使用闪存消息验证数据，我成功地将数据保存在数据库中，但在提交表单后消
jquery - $.getJSON 返回 “undefined” 或 [object Object] [object Object]
我知道有些人问了同样的问题并得到了解答。我已经查看了所有这些，但仍然无法解决我的问题。我有一个 jquery snipet，它将值发送到处理程序，处理程序处理来自 JS 的值并将数据作为 JSON 数
c# - object == object 而不是 object.id == object.id 潜在问题
我继承了一个非常草率的项目，我的任务是解释为什么它不好。我注意到他们在整个代码中都进行了这样的比较 (IQueryable).FirstOrDefault(x => x.Facility == fac
javascript - Object, Object 和 [1 : Object, 2 : Object]? 有什么区别
我只是在删除数组中的对象时偶然发现了这一点。代码如下: friends = []; friends.push( { a: 'Nexus', b: 'Muffi
objective-c - setting object = nil and [object release] VS [object release] and object = nil 有什么区别？
这两个代码片段有什么区别: object = nil; [object release] 对比 [object release]; object = nil; 哪个是最佳实践？最佳答案 object
javascript - Object.create(Object.prototype) , Object.create(Object) 和 Object.create(null) 之间的区别
我应该为其他人将从中继承的第一个父对象传递哪个参数，哪个参数更有效 Object.create(Object.prototype) Object.create(Object) Object.creat
objective-c - 执行cancelPreviousPerformRequestsWithTarget :selector:object: for all objects
我在不同的对象上安排不同的选择器 [self performSelector:@selector(doSmth) withObject:objectA afterDelay:1]; [self per
objective-c - 在 Objective-C 中打印 &object 和 object 的区别
NSLog(@"%p", &object); 和 NSLog(@"%p", object); 有什么区别？两者似乎都打印出一个内存地址，但我不确定哪个是对象的实际内存地址。最佳答案这就是我喜欢的

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 调用 process.extract 时出现 TypeError : ('expected string or bytes-like object' , 'occurred at index 0' )