gpt4 book ai didi

python - 模糊匹配一列中的字符串并使用 fuzzywuzzy 创建新数据框

转载 作者:行者123 更新时间:2023-12-05 06:27:55 24 4
gpt4 key购买 nike

我有以下数据框:

df = pd.DataFrame(
{'id': [1, 2, 3, 4, 5, 6],
'fruits': ['apple', 'apples', 'orange', 'apple tree', 'oranges', 'mango']
})
id fruits
0 1 apple
1 2 apples
2 3 orange
3 4 apple tree
4 5 oranges
5 6 mango

我希望在fruits列中找到模糊字符串,得到一个新的dataframe如下,其中ratio_score高于80。

如何使用 fuzzywuzzy 包在 Python 中做到这一点?谢谢。请注意,ratio_score 是作为示例虚构的一系列值。

我的解决方案:

df.loc[:,'fruits_copy'] = df['fruits']
df['ratio_score'] = df[['fruits', 'fruits_copy']].apply(lambda row: fuzz.ratio(row['fruits'], row['fruits_copy']), axis=1)

预期结果:

     id      fruits    matched_id     matched_fruits   ratio_score   
0 1 apple 2 apples 95
1 1 apple 4 apple tree 85
2 2 apples 4 apple tree 80
3 3 orange 5 oranges 95
4 6 mango

引用相关:

Fuzzy matching a sorted column with itself using python

Apply fuzzy matching across a dataframe column and save results in a new column

How do I fuzzy match items in a column of an array in python?

Using fuzzywuzzy to create a column of matched results in the data frame

最佳答案

我的解决方案引用如下:Apply fuzzy matching across a dataframe column and save results in a new column

df.loc[:,'fruits_copy'] = df['fruits']

compare = pd.MultiIndex.from_product([df['fruits'],
df['fruits_copy']]).to_series()

def metrics(tup):
return pd.Series([fuzz.ratio(*tup),
fuzz.token_sort_ratio(*tup)],
['ratio', 'token'])

compare.apply(metrics)

ratio token
apple apple 100 100
apples 91 91
orange 36 36
apple tree 67 67
oranges 33 33
mango 20 20
apples apple 91 91
apples 100 100
orange 33 33
apple tree 62 62
oranges 46 46
mango 18 18
orange apple 36 36
apples 33 33
orange 100 100
apple tree 25 25
oranges 92 92
mango 55 55
apple tree apple 67 67
apples 62 62
orange 25 25
apple tree 100 100
oranges 24 24
mango 13 13
oranges apple 33 33
apples 46 46
orange 92 92
apple tree 24 24
oranges 100 100
mango 50 50
mango apple 20 20
apples 18 18
orange 55 55
apple tree 13 13
oranges 50 50
mango 100 100

关于python - 模糊匹配一列中的字符串并使用 fuzzywuzzy 创建新数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54865890/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com