gpt4 book ai didi

python - 如何对 Pandas 数据框中的两列进行字符串比较?

转载 作者:太空宇宙 更新时间:2023-11-04 11:07:34 25 4
gpt4 key购买 nike

我有一个如下所示的数据框 df:

        a      b
0 Jon Jon
1 Jon John
2 Jon Johnny

我想将这两个字符串进行比较并创建一个这样的新列:

  df['compare'] = df2['a'] = df2['b']


a b compare
0 Jon Jon True
1 Jon John False
2 Jon Johnny False

我还希望能够通过这个 levenshtein 函数传递列 a 和 b:

def levenshtein_distance(a, b):
"""Return the Levenshtein edit distance between two strings *a* and *b*."""
if a == b:
return 0
if len(a) < len(b):
a, b = b, a
if not a:
return len(b)
previous_row = range(len(b) + 1)
for i, column1 in enumerate(a):
current_row = [i + 1]
for j, column2 in enumerate(b):
insertions = previous_row[j + 1] + 1
deletions = current_row[j] + 1
substitutions = previous_row[j] + (column1 != column2)
current_row.append(min(insertions, deletions, substitutions))
previous_row = current_row
return previous_row[-1]

并添加这样一列:

  df['compare'] = levenshtein_distance(df2['a'], df2['b'])      

a b compare
0 Jon Jon 100
1 Jon John .95
2 Jon Johnny .87

但是当我尝试时出现这个错误:

  ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我如何格式化我的数据/数据框以允许它比较两列并将比较添加为第三列?

最佳答案

只是做:

df['compare'] = [levenshtein_distance(a, b) for a, b in zip(df2['a'], df2['b'])]

或者,如果你想要相等比较:

df['compare'] = (df['a'] == df['b'])

关于python - 如何对 Pandas 数据框中的两列进行字符串比较?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59075815/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com