gpt4 book ai didi

What distance measure can I use that factors in order? [closed](我可以用什么距离来衡量这些因素的顺序?[已关闭])

转载 作者:bug小助手 更新时间:2023-10-24 21:08:57 25 4
gpt4 key购买 nike




I have a few lists that havethe same IDs that are strings. They are as follows:

我有几个列表,它们的ID与字符串相同。这些建议如下:


list1 = ["1", "2", "3", "4", "5"]
list2 = ["1", "2", "3", "5", "4"]
list3 = ["1", "5", "4", "3", "2"]
list4 = ["4", "2", "5", "3", "1"]

What measure can I use to determine the lists that are closest to each other here in terms of order? Ideally list1 and list2 should be the closest here.

我可以使用什么方法来确定在这里顺序上彼此最接近的列表?理想情况下,list1和list2应该是这里最接近的。


Does the spearman correlation make sense here?

斯皮尔曼的相关性在这里有意义吗?


更多回答

What, exactly, do you mean by "closest to each other here in terms of order"? What makes one pair of lists closer than another pair? Should ["1", "5", "4", "3", "2"] and ["2", "1", "3", "4", "5"] be considered closer than ["1", "2", "3", "4", "5"] and ["1", "2", "4", "3", "5"]?

你所说的“在秩序上彼此最接近”到底是什么意思?是什么让一对清单比另一对清单更接近?[“1”、“5”、“4”、“3”、“2”]和[“2”、“1”、“4”、“5”]是否应该比[“1”、“2”、“3”、“4”、“5”]和[“1”、“2”、“4”、“3”、“5”]更接近?

You've tagged this levenshtein-distance, but Levenshtein distance has nothing to do with order. Spearman's correlation coefficient isn't applicable either. You don't have two rank variables to correlate, and even if you did, Spearman's correlation coefficient is a correlation coefficient, not a distance metric.

你已经把这个标记为-距离,但距离与秩序无关。斯皮尔曼的相关系数也不适用。你没有两个等级变量来关联,即使你关联了,斯皮尔曼的相关系数也是一个相关系数,而不是距离度量。

And is list4 supposed to have 2 "2"s and no "1"? What space are these samples supposed to be drawn from?

列表4是不是应该有2“2”S而没有“1”?这些样本应该是从哪个空间提取的?

Thanks @user2357112 for the observations. Ideally list1 = ["1", "2", "3", "4", "5"] and list2 = ["1", "2", "3", "5", "4"] as mentioned has elements 1, 2 and 3 in the same positions. So out of five, three of them are in the same position. That's the order I'm referring to, so should be deemed very close Compared to list3 and list4. Sorry I wrongly tagged Levenshtein distance. In your opinion, how can I measure how similar the lists are based on element positions as explained earlier?

感谢@user2357112的评论。理想情况下,如上所述,列表1 = [“1”,“2”,“3”,“4”,“5”]和列表2 = [“1”,“2”,“3”,“5”,“4”]具有处于相同位置的元素1、2和3。所以五个人中,有三个人的位置相同。这就是我所指的顺序,所以应该被认为是非常接近的相比,名单3和名单4。对不起,我错误地标记了Levenshtein距离。在你看来,我如何根据前面解释的元素位置来衡量列表的相似程度?

So rather than lists being close to each other in the lexicographic order of all lists in whatever the sample space is, you're looking for something that captures some notion of one list's internal order being similar to another list's internal order.

因此,无论样本空间是什么,列表都不是按照所有列表的词典顺序彼此接近,而是寻找一些东西来捕捉一个列表的内部顺序与另一个列表的内部顺序类似的概念。

优秀答案推荐

Edit distance seems to be a good candidate for such a metric.

编辑距离似乎是这样一个指标的一个很好的候选者。


from typing import List


def calcEditDistance(lhs: List[str], rhs: List[str]) -> int:
'''
Dynamic programming
dp[i][j] = minimum number of operations to convert lhs[0:i] to rhs[0:j]
'''
m = len(lhs)
n = len(rhs)
dp = [[0] * (n + 1) for _ in range(m + 1)]

for i in range(1, m + 1):
dp[i][0] = i

for j in range(1, n + 1):
dp[0][j] = j

for i in range(1, m + 1):
for j in range(1, n + 1):
if lhs[i - 1] == rhs[j - 1]:
dp[i][j] = dp[i - 1][j - 1]
else:
dp[i][j] = min(dp[i - 1][j - 1], dp[i - 1]
[j], dp[i][j - 1]) + 1

return dp[m][n]


list1 = ["1", "2", "3", "4", "5"]
list2 = ["1", "2", "3", "5", "4"]
list3 = ["1", "5", "4", "3", "2"]
list4 = ["4", "2", "5", "3", "2"]

res = calcEditDistance(list1, list2)
print(f"dis[1, 2] = {res}")

res = calcEditDistance(list1, list3)
print(f"dis[1, 3] = {res}")

res = calcEditDistance(list1, list4)
print(f"dis[1, 4] = {res}")

res = calcEditDistance(list2, list3)
print(f"dis[2, 3] = {res}")

res = calcEditDistance(list2, list4)
print(f"dis[2, 4] = {res}")

res = calcEditDistance(list3, list4)
print(f"dis[3, 4] = {res}")

prints

指纹


dis[1, 2] = 2
dis[1, 3] = 4
dis[1, 4] = 4
dis[2, 3] = 4
dis[2, 4] = 4
dis[3, 4] = 3

which matches your intuition.

这与你的直觉相符。


Note that in the Python code I use the Levenshtein distance where insert, delete, and replace operations are allowed. You can, of course, use other types of edit distance.

请注意,在Python代码中,我使用了允许执行INSERT、DELETE和REPLACE操作的Levenshtein距离。当然,您可以使用其他类型的编辑距离。



The comments have clarified that you're looking for something where two lists are closer together the more elements they have in the same positions. In that case, just count how many elements they have in different positions:

这些评论澄清了你正在寻找的东西,两个列表离得越近,它们在相同位置的元素就越多。在这种情况下,只需数一数它们在不同位置有多少元素:


def distance(l1, l2):
return sum(1 for i, j in zip(l1, l2) if i != j)

更多回答

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com