gpt4 book ai didi

javascript - 用于查找两个字符串的最长公共(public)子序列的内存表是否也可以用于查找差异索引?

转载 作者:塔克拉玛干 更新时间:2023-11-03 03:54:21 25 4
gpt4 key购买 nike

我正在编写一个 GitHub 浏览器扩展,它以另一种方式显示一些差异。 GitHub 只是显示预先计算的差异,所以我需要自己重新比较。

UNIX diff util 基于查找 Longest Common Subsequence .我找到了an implementation in javascript-algorithms .但是,这仅显示 LCS 结果,而不显示发生差异的索引。

以维基百科为例,调用上述实现

longestCommonSubsequence('abcdfghjqz', 'abcdefgijkrxyz');

产生一个数组,
(8) ["a", "b", "c", "d", "f", "g", "j", "z"]

但我需要的是能让我弄清楚的东西:
abcd fgh j    z
abcdefg ijkrxyz
+ -+ ++++

我不相信它像维基百科文章中所说的那么简单......

From a longest common subsequence it is only a small step to get diff-like output: if an item is absent in the subsequence but present in the first original sequence, it must have been deleted (as indicated by the '-' marks, below). If it is absent in the subsequence but present in the second original sequence, it must have been inserted (as indicated by the '+' marks).



...因为对于更复杂的字符串(即代码),会有重复的元素需要大量回溯才能确定“真正的”差异开始和结束的位置。

然而,我注意到 DP 实现留下了一个内存表, lcsMatrix ,对于 abcd...示例叶子:

enter image description here

最后一行和最后一列可以用来准确地收集差异在哪里吗?

要生成上表并输出结果,只需添加
  console.table(lcsMatrix);
console.log(longestSequence);

在链接实现结束时。

如果我弄明白了,我会发布一个自我回答。不过,到目前为止,我一直在躲避。

最佳答案

看看下面... https://github.com/jonTrent/PatienceDiff

以您的数据为例...

diff = patienceDiff('abcdfghjqz'.split(''), 'abcdefgijkrxyz'.split(''));

...返回...
{lines: Array(16), lineCountDeleted: 2, lineCountInserted: 6, lineCountMoved: 0}

lineCountDeleted: 2
lineCountInserted: 6
lineCountMoved: 0
lines: Array(16)
0: {line: "a", aIndex: 0, bIndex: 0}
1: {line: "b", aIndex: 1, bIndex: 1}
2: {line: "c", aIndex: 2, bIndex: 2}
3: {line: "d", aIndex: 3, bIndex: 3}
4: {line: "e", aIndex: -1, bIndex: 4}
5: {line: "f", aIndex: 4, bIndex: 5}
6: {line: "g", aIndex: 5, bIndex: 6}
7: {line: "h", aIndex: 6, bIndex: -1}
8: {line: "i", aIndex: -1, bIndex: 7}
9: {line: "j", aIndex: 7, bIndex: 8}
10: {line: "q", aIndex: 8, bIndex: -1}
11: {line: "k", aIndex: -1, bIndex: 9}
12: {line: "r", aIndex: -1, bIndex: 10}
13: {line: "x", aIndex: -1, bIndex: 11}
14: {line: "y", aIndex: -1, bIndex: 12}
15: {line: "z", aIndex: 9, bIndex: 13}
length: 16

请注意,结果指的是“行”,因为该算法是在考虑 github 样式差异的情况下构建的,即逐行比较。但是将样本数据字符串拆分为一个字符“行”的数组允许该算法也可用于字符串......

在哪里 aIndex === -1表示该字符是从第二个字符串添加的,其中 bIndex === -1表示该字符已从第一个字符串中删除。

还包括一个名为 PatientDiffPlus 的版本,它可以识别行/字符的可能移动...(另见 Find difference between two strings in JavaScript)

关于javascript - 用于查找两个字符串的最长公共(public)子序列的内存表是否也可以用于查找差异索引?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57535619/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com