python - 有人可以根据与另一个列表的比较来解释拆分列表元素吗？-6ren

python - 有人可以根据与另一个列表的比较来解释拆分列表元素吗？

转载作者：太空狗更新时间：2023-10-30 02:55:18

假设我有两个包含相同字符串但拆分不同的列表：

sentences = ["This is a sentence", "so is this"]
phrases = ["This is", "a sentence so", "is this"]

我要做的是检查“短语”列表中的任何元素是否由句子中的某个元素完全表示，然后相应地拆分该“短语”元素。例如，在这种情况下：

"a sentence so"

在“短语”中，部分在“句子”中的元素1和元素2中表示，所以应该在“一个句子”和“所以”之间分开，以便创建一个新的元素。
短语中的“this is”和“is this”应该被有效地忽略，因为它们都完全对应于句子中的一个元素。接下来，假设我想做一个元素计数来确定每个列表中有多少个，“句子”的结果仍然应该是2，但“短语”的结果应该是3到4。

Sentencecount=0
Phrasecount=0
for i in sentences:
    Sentencecount+=1
for n in phrases:
#code here should check each element with 'sentences' elements and split    accordingly
    Phrasecount += 1

#expected result: phrases = ["This is", "a sentence", "so", "is this"]

最佳答案

好吧，这更难——更有趣！--比我预料的要多。

from collections import deque

def align_wordlists(words1, words2):
    # Split every element of the word lists
    # >>> [e.split(" ") for e in ["This is", "a sentence"]]
    # [["This", "is"], ["a", "sentence"]]
    words1_split = [e.split(" ") for e in words1]
    words2_split = [e.split(" ") for e in words2]

    # Assert that the flattened lists are identical
    assert [word for split in words1_split for word in split] == \
            [word for split in words2_split for word in split]

    # Create a queue and two tracking lists
    Q = deque(enumerate(words2_split))
    result = []
    splits = []

    # Keep track of the current sublist in words1
    words1_sublist_id = 0
    words1_sublist_offset = 0

    # Keep iterating until the queue is empty
    while Q:
        sublist_id, sublist = Q.popleft()
        sublist_len = len(sublist)

        words1_sublist_len = len(words1_split[words1_sublist_id])
        words1_remaining_len = words1_sublist_len - words1_sublist_offset

        if sublist_len <= words1_remaining_len:
            # The sublist fits entirely into the current segment in words 1,
            # add sublist untouched to resulting list.
            result.append(" ".join(sublist))

            # Update the sublist tracking
            if (words1_sublist_len - words1_sublist_offset - sublist_len) == 0:
                # The sublist filled the remaining space
                words1_sublist_id += 1
                words1_sublist_offset = 0
            else:
                # The sublist only filled part of the remaining space
                words1_sublist_offset += sublist_len

        else:
            # Only part of the current sublist fits.
            # Split the segment at the point where the left
            # part fits into the current segment of words1.
            # Then add the remaining right list to the front
            # of the queue.
            left = " ".join(sublist[:words1_remaining_len])
            right = sublist[words1_remaining_len:]
            result.append(left)
            Q.appendleft((sublist_id, right))

            # Keep track of splits
            splits.append(sublist_id)

            # update indices
            words1_sublist_id += 1
            words1_sublist_offset = 0

    # Combine splits into sublists to get desired result
    for split in splits:
        if isinstance(result[split], str):
            result[split:split+2] = [[result[split], result[split + 1]]]
        else:
            result[split] = result[split] + [result[split + 1]]
            del result[split + 1]

    return result

实例

>>> words1 = ["This is a sentence", "so is this"]
>>> words2 = ["This is", "a sentence so", "is this"]
>>> align_wordlists(words1, words2)
['This is', ['a sentence', 'so'], 'is this']

>>> words1 = ["This is a longer", "sentence with", "different splits"]
>>> words2 = ["This is", "a longer sentence", "with different splits"]
>>> align_wordlists(words1, words2)
['This is', ['a longer', 'sentence'], ['with', 'different splits']]

>>> words1 = ["This is a longer", "sentence with", "different splits"]
>>> words2 = ["This is", "a longer sentence with different splits"]
>>> align_wordlists(words1, words2)
['This is', ['a longer', 'sentence with', 'different splits']]

算法综述
这里使用的算法的高级描述。
你所描述的问题归结为这个问题：
对于第二个单词列表中的每个短语，它属于第一个列表中的哪个句子？
为了回答这个问题，我们在上面的算法中采取了几个步骤：
将 words1和 words2中的词组分成子列表。我们一开始就这样做，因为这样以后处理短语中的单个单词就更容易了。

def align_wordlists(words1, words2):
    # Split every element of the word lists
    # >>> [e.split(" ") for e in ["This is", "a sentence"]]
    # [["This", "is"], ["a", "sentence"]]
    words1_split = [e.split(" ") for e in words1]
    words2_split = [e.split(" ") for e in words2]

为了确保这个算法能起作用，我添加了一个断言，它验证了如果我们忽略每个分裂和空间，这两个句子（即单词列表）是完全相同的：

    # Assert that the flattened lists are identical
    assert [word for split in words1_split for word in split] == \
            [word for split in words2_split for word in split]

为了跟踪需要查看的短语，我们使用 deque，它是python collections库的一部分。

    # Create a queue and two tracking lists
    Q = deque(enumerate(words2_split))
    result = []
    splits = []

我们用第二个单词列表的每个短语来初始化这个队列，并与单词列表中的索引相结合。见 enumerate。
因为我们正在比较第二个单词表中的短语和第一个单词表中的句子，所以我们必须以某种方式跟踪我们在哪里以及我们已经在第一个单词表中查找过的位置。

    # Keep track of the current sublist in words1
    words1_sublist_id = 0
    words1_sublist_offset = 0

由于队列是我们的“工作堆栈”，只要队列中有项，我们就执行以下代码：

    # Keep iterating until the queue is empty
    while Q:

首先要做的是：从队列前面获取项目。在初始化过程中，我解压缩了在步骤3中被推到列表中的元组。 sublist_id是子列表在第二个单词列表中的位置的索引， sublist是单词的实际列表，即短语。此外，我们还计算短语的长度，稍后我们将需要。

        sublist_id, sublist = Q.popleft()
        sublist_len = len(sublist)

现在我们需要检查现在的短语是否符合我们正在看的句子。（在算法开始时， words1_sublist_id为0，所以我们正在看第一个单词列表中的第一个组。

        words1_sublist_len = len(words1_split[words1_sublist_id])
        words1_remaining_len = words1_sublist_len - words1_sublist_offset

它的意思是：“它适合吗？”如果这个短语适合这个句子，这个短语完全可以用这个句子来表示。
如果：短语的长度比剩余句子的长度短，即：我们不必分开！

        if sublist_len <= words1_remaining_len:

因为我们不需要拆分，所以我们可以将短语附加到 result列表中（我在一个空格 join上将短语组合成一个字符串）。

        # The sublist fits entirely into the current segment in words 1,
        # add sublist untouched to resulting list.
        result.append(" ".join(sublist))

因为我们刚刚把这个短语放进句子中，所以我们必须更新跟踪以反映我们所取得的进展。在做这件事的时候，我们必须注意遵守句子的界限。

        # Update the sublist tracking
        if (words1_sublist_len - words1_sublist_offset - sublist_len) == 0:
            # The sublist filled the remaining space
            words1_sublist_id += 1
            words1_sublist_offset = 0
        else:
            # The sublist only filled part of the remaining space
            words1_sublist_offset += sublist_len

else：短语的长度比剩余的句子长，即短语不能用句子来表示。

        else:

在这种情况下，我们必须在短语溢出到下一句时将其拆分。我们根据句子中剩余单词的数量来确定“分割点”（例如，如果这个短语有3个单词长，但这个句子只剩下2个单词，我们将这个短语在2个单词后分割）。

        # Only part of the current sublist fits.
        # Split the segment at the point where the left
        # part fits into the current segment of words1.
        # Then add the remaining right list to the front
        # of the queue.
        left = " ".join(sublist[:words1_remaining_len])
        right = sublist[words1_remaining_len:]

（因为分割的 " "部分是“完成的”，所以我将它转换成一个字符串。 left部分还没有完成，我们仍然关心它被拆分成单个单词。）
拆分短语后，我们可以将 join部分推到 right列表中，因为我们现在知道它在当前句子中完全表示。不过，我们对 left部分一无所知：它可能适合下一个句子，也可能会溢出那个句子（参见示例4）。
因为我们不知道如何处理 result部分，所以我们必须把它当作一个新的短语来对待：也就是说，我们只是把它添加到我们的工作队列的前面，以便在下一次运行时进行处理。

        result.append(left)
        Q.appendleft((sublist_id, right))

我们的 right列表将不包括我们拆分的点，因此我们会跟踪拆分点。

        # Keep track of splits
        splits.append(sublist_id)

同样，我们必须在 right-列表中跟踪我们当前的位置。因为我们知道我们已经溢出了当前语句，所以我们可以简单地增加索引并重置偏移量。

        # update indices
        words1_sublist_id += 1
        words1_sublist_offset = 0

在工作队列为空的情况下，我们可以获取已拆分短语的子列表。这有点棘手：

    # Combine splits into sublists to get desired result
    for split in splits:

如果我们看到的分裂点是一个字符串，我们可以推断出我们在这个位置还没有分裂。因此，我们可以用包含两个单词的列表替换分割点上的项和之后的项。（我们使用的是 result而不是 words1，因为范围不包括在内。）

    if isinstance(result[split], str):
        result[split:split+2] = [[result[split], result[split + 1]]]

然而，如果拆分点是一个列表，我们知道我们已经在一个点上进行了拆分（即一个短语至少溢出一个句子两次，请参见示例4）。
在本例中，我们将项目追加到列表之后， split+2追加到列表中，并使用 split+1删除现在追加的项目。

    else:
        result[split] = result[split] + [result[split + 1]]
        del result[split + 1]

一言一行，皆可报答！

    return result

关于python - 有人可以根据与另一个列表的比较来解释拆分列表元素吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42859487/

文章推荐： python - 如何将数据框重新分区为固定大小的分区？

文章推荐： Python检查udp端口打开

文章推荐： python - numpy argsort 可以返回较低的关系索引吗？

JavaScript 拆分
假设我有这个变量 var image = "image.jpg"; 我正在尝试拆分变量图像的内容并将 _thumbs 插入其中以获得类似 image_thumbs.jpg 的内容。我该如何解决这个问
excel - 拆分，转义某些拆分
我有一个包含多个问题和答案的单元格，其组织方式类似于 CSV。因此，为了将所有这些问题和答案分开，使用逗号作为分隔符的简单拆分应该很容易分开。不幸的是，有些值使用逗号作为小数分隔符。有没有办法避免这
d - 拆分/拆分的编译问题
这是简单的代码: import std.algorithm; import std.array; import std.file; void main(string[] args) { aut
java useDelimeter 拆分 -
我正在尝试解析一个看起来像的 txt 文件 A - 19 B - 2 C - 3 我正在使用扫描仪方法读取它并在“- ”中拆分，以便我的输出看起来像: A 19 B 2 C 3 但是它似乎没有正确拆分
qt - QString 拆分
我有这些网址字符串 file:///home/we/Pictures/neededWord/3193_n.jpg file:///home/smes/Pictures/neededWord/jds_2
没有最终修剪的 Groovy 拆分
我正在解析一个 CVS 文件，如下所示: "07555555555",25.70,18/11/2010,01/03/2011,N,133,0,36,,896,537,547,,Mr,John,Doe,
管道后的 PowerShell 拆分
我在脚本中使用以下行返回 $folder 处所有文件夹的所有路径地点。 dir -recurse $folder|?{$_.PSIsContainer}|select -ExpandProperty
Javascript 拆分、替换表现奇怪
我正在尝试将字符串格式化为word+word+word 例如 “超音乐节”变成“超+音乐+节日” 我尝试过使用以下代码 query.split(" ").join("+"); 或 query.repl
Perl系统+拆分+数组
我叫 luis，住在 arg。我有一个问题，无法解决。 **IN BASH** pwd /home/labs-perl ls file1.pl file2.pl **IN PERL** my $ls
java - 拆分 JsonArray
我想从包 javax.json 中拆分 JsonArray，但我找不到完成这项工作的便捷方法。我查看了文档，只能想到迭代 JsonArray 并使用 JsonArrayBuilder 手动添加项目。
Java 正则表达式/拆分
我希望在第一个 ':' 处拆分字符串，以防止字符串的第二部分包含 ':' 时出现问题。我一直在研究正则表达式，但仍然遇到一些问题，有人可以帮我吗？谢谢。最佳答案您可以使用overload of s
python - 拆分 RDD
我想拆分列表的列表 ((A,1,2,3),(B,4,5,6),(C,7,8,9))进入: (A,1) (A,2) (A,3) (B,4) (B,5) ... 我试过rdd.flatMapValues(
Javascript 数组 - 拆分
我有一个文本文件，其中每一行都有数据。它看起来像这样: number0;text0 number1;text1 number2;text2 ..等等所以我通过 xmlhttprequest 将该文本
C#数组题(拆分)
问题很简单——比如说，我得到了函数，它接收数组作为参数 void calc(double[] data) 如何将这些数据“拆分”成两个子数组并像这样传递给子函数 calc_sub(data(0, le
Java 拆分(字符串操作)
我想显示来自 EMAIL_TEXT 数据库列的数据，在定义的字符处拆分列。出于某种原因，我的结果只打印第一行到我拆分字符串的位置，跳过其余行。这是我希望在每个“|”之后拆分的数据。这里是要拆分的数据
JavaScript - 拆分，选择给定数字后的所有内容
我有一个动态数组，我想排除字符串的第一部分，但我不知道第一部分之后会有多少对象，我想将它们全部包含在一个新字符串中。 string = "text.'''hi''','''who''' '''are'
Javascript 拆分 URL
我想拆分 URL 的某些特定部分，这是我目前所做的。 var query = window.location.pathname.split( '/' ); query = window.locati
java - 拆分、丰富和组合
我有一条消息携带 XML(订单)，其中包含多个同质节点(比如产品列表)以及其他信息(比如地址、客户详细信息等)。我必须使用另一个外部服务提供的详细信息来丰富每个“产品”，并返回带有丰富“产品”的相同完
JavaScript 拆分，更改零件编号
我有一个动态生成的大字符串，我正在拆分它。 var myString="val1, val, val3, val4..... val400" 我对此字符串进行了简单的拆分， myString= myS
java - 拆分 - 如何在结果中获取尾随的空字符串
这个问题在这里已经有了答案: Java String split removed empty values (5 个答案) 关闭 7 年前。我正在尝试使用 split(";") 将字符串转换为数组

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 有人可以根据与另一个列表的比较来解释拆分列表元素吗？