python - 将数字列表拆分为 n 个 block ，使这些 block 具有(接近)相等的总和并保持原始顺序

转载作者：塔克拉玛干更新时间：2023-11-03 06:37:02

这不是标准的分区问题，因为我需要维护列表中元素的顺序。

例如，如果我有一个列表

[1, 6, 2, 3, 4, 1, 7, 6, 4]

我想要两个 block ，那么分割应该给

[[1, 6, 2, 3, 4, 1], [7, 6, 4]]

每边总和为 17。对于三个 block ，结果将是

[[1, 6, 2, 3], [4, 1, 7], [6, 4]]

对于 12、12 和 10 的总和。

编辑以获取更多解释

我目前将总和除以 block 数并将其用作目标，然后迭代直到接近该目标。问题是某些数据集可能会搞乱算法，例如试图将以下内容分成 3 份:-

[95, 15, 75, 25, 85, 5]

总和为 300，目标为 100。第一个 block 的总和为 95，第二个总和为 90，第三个总和为 110，5 为“剩余”。将它附加到它应该出现的位置会得到 95、90、115，其中更“合理”的解决方案是 110、100、90。

结束编辑

背景:

我有一个包含不同高度的文本(歌词)的列表，我想将文本分成任意数量的列。目前我根据所有行的总高度计算目标高度，但显然这是一个一致的低估，在某些情况下会导致次优解决方案(最后一列明显更高)。

最佳答案

此方法定义分区边界，将数组划分为大致相等数量的元素，然后反复搜索更好的分区，直到找不到更多为止。它不同于大多数其他已发布的解决方案，因为它希望通过尝试多个不同的分区来找到最佳解决方案。其他解决方案试图通过阵列单次传递创建良好的分区，但我想不出保证最优的单次传递算法。

此处的代码是此算法的有效实现，但可能难以理解，因此在末尾包含一个更具可读性的版本作为附录。

def partition_list(a, k):
    if k <= 1: return [a]
    if k >= len(a): return [[x] for x in a]
    partition_between = [(i+1)*len(a)/k for i in range(k-1)]
    average_height = float(sum(a))/k
    best_score = None
    best_partitions = None
    count = 0

    while True:
        starts = [0]+partition_between
        ends = partition_between+[len(a)]
        partitions = [a[starts[i]:ends[i]] for i in range(k)]
        heights = map(sum, partitions)

        abs_height_diffs = map(lambda x: abs(average_height - x), heights)
        worst_partition_index = abs_height_diffs.index(max(abs_height_diffs))
        worst_height_diff = average_height - heights[worst_partition_index]

        if best_score is None or abs(worst_height_diff) < best_score:
            best_score = abs(worst_height_diff)
            best_partitions = partitions
            no_improvements_count = 0
        else:
            no_improvements_count += 1

        if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
            return best_partitions
        count += 1

        move = -1 if worst_height_diff < 0 else 1
        bound_to_move = 0 if worst_partition_index == 0\
                        else k-2 if worst_partition_index == k-1\
                        else worst_partition_index-1 if (worst_height_diff < 0) ^ (heights[worst_partition_index-1] > heights[worst_partition_index+1])\
                        else worst_partition_index
        direction = -1 if bound_to_move < worst_partition_index else 1
        partition_between[bound_to_move] += move * direction

def print_best_partition(a, k):
    print 'Partitioning {0} into {1} partitions'.format(a, k)
    p = partition_list(a, k)
    print 'The best partitioning is {0}\n    With heights {1}\n'.format(p, map(sum, p))

a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2) 
print_best_partition(a, 3)
print_best_partition(a, 4)

b = [1, 10, 10, 1]
print_best_partition(b, 2)

import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)

d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)

根据您的用途，可能需要进行一些修改。例如，为了确定是否找到了最佳分区，当分区之间没有高度差时，该算法停止，它没有找到比连续 5 次以上迭代看到的最好的东西更好的东西，或者在 100 次之后总迭代次数作为一个包罗万象的停止点。您可能需要调整这些常量或使用不同的方案。如果您的高度形成了一个复杂的值(value)观景观，知道何时停止可能会遇到试图逃避局部最大值等经典问题。

输出

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 1 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1, 7, 6, 4]]
With heights [34]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 2 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1], [7, 6, 4]]
With heights [17, 17]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 3 partitions
The best partitioning is [[1, 6, 2, 3], [4, 1, 7], [6, 4]]
With heights [12, 12, 10]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 4 partitions
The best partitioning is [[1, 6], [2, 3, 4], [1, 7], [6, 4]]
With heights [7, 9, 8, 10]

Partitioning [1, 10, 10, 1] into 2 partitions
The best partitioning is [[1, 10], [10, 1]]
With heights [11, 11]

Partitioning [7, 17, 17, 1, 8, 8, 12, 0, 10, 20, 17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9, 12, 3, 18, 9, 6, 7, 19, 20, 17, 7, 4, 3, 16, 20, 6, 7, 12, 16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16, 14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5, 13, 16, 0, 16, 7, 3, 8, 1, 20, 16, 11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18, 20, 3, 10, 9, 13, 12, 15, 6, 14, 16, 6, 12, 9, 9, 16, 14, 19, 1] into 10 partitions
The best partitioning is [[7, 17, 17, 1, 8, 8, 12, 0, 10, 20], [17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9], [12, 3, 18, 9, 6, 7, 19, 20], [17, 7, 4, 3, 16, 20, 6, 7, 12], [16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16], [14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5], [13, 16, 0, 16, 7, 3, 8, 1, 20, 16], [11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18], [20, 3, 10, 9, 13, 12, 15, 6, 14], [16, 6, 12, 9, 9, 16, 14, 19, 1]]
With heights [100, 95, 94, 92, 90, 87, 100, 93, 102, 102]

Partitioning [95, 15, 75, 25, 85, 5] into 3 partitions
The best partitioning is [[95, 15], [75, 25], [85, 5]]
With heights [110, 100, 90]

编辑

添加了新的测试用例 [95, 15, 75, 25, 85, 5]，此方法可以正确处理。

附录

此版本的算法更易于阅读和理解，但由于较少利用内置的 Python 功能，因此有点长。然而，它的执行时间似乎相当，甚至稍快。

#partition list a into k partitions
def partition_list(a, k):
    #check degenerate conditions
    if k <= 1: return [a]
    if k >= len(a): return [[x] for x in a]
    #create a list of indexes to partition between, using the index on the
    #left of the partition to indicate where to partition
    #to start, roughly partition the array into equal groups of len(a)/k (note
    #that the last group may be a different size) 
    partition_between = []
    for i in range(k-1):
        partition_between.append((i+1)*len(a)/k)
    #the ideal size for all partitions is the total height of the list divided
    #by the number of paritions
    average_height = float(sum(a))/k
    best_score = None
    best_partitions = None
    count = 0
    no_improvements_count = 0
    #loop over possible partitionings
    while True:
        #partition the list
        partitions = []
        index = 0
        for div in partition_between:
            #create partitions based on partition_between
            partitions.append(a[index:div])
            index = div
        #append the last partition, which runs from the last partition divider
        #to the end of the list
        partitions.append(a[index:])
        #evaluate the partitioning
        worst_height_diff = 0
        worst_partition_index = -1
        for p in partitions:
            #compare the partition height to the ideal partition height
            height_diff = average_height - sum(p)
            #if it's the worst partition we've seen, update the variables that
            #track that
            if abs(height_diff) > abs(worst_height_diff):
                worst_height_diff = height_diff
                worst_partition_index = partitions.index(p)
        #if the worst partition from this run is still better than anything
        #we saw in previous iterations, update our best-ever variables
        if best_score is None or abs(worst_height_diff) < best_score:
            best_score = abs(worst_height_diff)
            best_partitions = partitions
            no_improvements_count = 0
        else:
            no_improvements_count += 1
        #decide if we're done: if all our partition heights are ideal, or if
        #we haven't seen improvement in >5 iterations, or we've tried 100
        #different partitionings
        #the criteria to exit are important for getting a good result with
        #complex data, and changing them is a good way to experiment with getting
        #improved results
        if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
            return best_partitions
        count += 1
        #adjust the partitioning of the worst partition to move it closer to the
        #ideal size. the overall goal is to take the worst partition and adjust
        #its size to try and make its height closer to the ideal. generally, if
        #the worst partition is too big, we want to shrink the worst partition
        #by moving one of its ends into the smaller of the two neighboring
        #partitions. if the worst partition is too small, we want to grow the
        #partition by expanding the partition towards the larger of the two
        #neighboring partitions
        if worst_partition_index == 0:   #the worst partition is the first one
            if worst_height_diff < 0: partition_between[0] -= 1   #partition too big, so make it smaller
            else: partition_between[0] += 1   #partition too small, so make it bigger
        elif worst_partition_index == len(partitions)-1: #the worst partition is the last one
            if worst_height_diff < 0: partition_between[-1] += 1   #partition too small, so make it bigger
            else: partition_between[-1] -= 1   #partition too big, so make it smaller
        else:   #the worst partition is in the middle somewhere
            left_bound = worst_partition_index - 1   #the divider before the partition
            right_bound = worst_partition_index   #the divider after the partition
            if worst_height_diff < 0:   #partition too big, so make it smaller
                if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]):   #the partition on the left is bigger than the one on the right, so make the one on the right bigger
                    partition_between[right_bound] -= 1
                else:   #the partition on the left is smaller than the one on the right, so make the one on the left bigger
                    partition_between[left_bound] += 1
            else:   #partition too small, make it bigger
                if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the left smaller
                    partition_between[left_bound] -= 1
                else:   #the partition on the left is smaller than the one on the right, so make the one on the right smaller
                    partition_between[right_bound] += 1

def print_best_partition(a, k):
    #simple function to partition a list and print info
    print '    Partitioning {0} into {1} partitions'.format(a, k)
    p = partition_list(a, k)
    print '    The best partitioning is {0}\n    With heights {1}\n'.format(p, map(sum, p))

#tests
a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2) 
print_best_partition(a, 3)
print_best_partition(a, 4)
print_best_partition(a, 5)

b = [1, 10, 10, 1]
print_best_partition(b, 2)

import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)

d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)

关于python - 将数字列表拆分为 n 个 block ，使这些 block 具有(接近)相等的总和并保持原始顺序，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56086075/

文章推荐： algorithm - AI单文档搜索

文章推荐： javascript - 如何从对象数组中删除所有重复项？

文章推荐： algorithm - 合并辅助过程的复杂性

文章推荐： javascript - leetcode : render log files, 无法排序

java - 为什么自动装箱的整数和 .getClass() 值 == 相等，而不仅仅是 .equals() 相等？
也许我在 Java 上工作的时间太长而没有真正理解它的一些基础知识。我确实理解 == 用于对象引用相等，而 .equals() 用于对象值相等。比较整数: Integer x = 1, y = 1
javascript - boolean 相等
我是从一道考试题中得出这个答案的，但无法理解该解决方案的工作原理。如果值“x”和“y”相等，则此函数应该返回“true”，否则返回 False。解决方法: function equal_boolea
Mysql utf8mb4 相等
我将带有表情符号的文本存储在 mysql 数据库中。数据库、表和列设置为使用utf8mb4和utf8mb4_unicode_ci。我可以毫无问题地输入单元格值(数据类型是 VARCHAR)。但是
PHP DateTime 相等
如果两个 DateTime 对象具有相同的日、月和年，我该如何比较？问题是他们有不同的小时/分钟/秒。最佳答案对于 DateTime 对象，没有好的方法可以做到这一点。所以你必须做，比方说，不是那
c# - GetHashCode 相等
我一直想知道这个问题，所以我想我会问的。您将看到的大多数地方都使用相同的语义逻辑来覆盖 Equals 和 GetHashCode 以实现成员平等...但是它们通常使用不同的实现: publi
comparison - CoreGraphics 中的浮点比较(相等)
苹果 CoreGraphics.framework , CGGeometry.h : CG_INLINE bool __CGSizeEqualToSize(CGSize size1, CGSize s
python - 由于最新的python版本保留了dict的插入顺序，相等(==)的含义会改变吗？
在最新的python 版本中， dict 保留了插入的顺序。在平等方面是否有任何变化。例如，目前以下工作。既然广告顺序很重要， future 会不会发生这种变化？我问是因为有根本性的变化 - 以前
django 过滤器两个 ManyToManyField 相等
class VideoUserModel(models.Model): user = models.ManyToManyField(get_user_model()) viewlist
coq - 枚举类型的 COQ 相等
我在 COQ 中有一个有限枚举类型(比如 T)，我想检查元素是否相等。这意味着，我需要一个函数 bool beq_T(x:T,y:T) 我设法定义这样一个函数的唯一方法是逐个分析。这会导致很多匹配语
fortran - 测试 float 相等
我在 Windows 7(32 位)下的 MinGW 中使用 gfortran 来编译 Fortran 代码。这是文件 testequal.f 中包含的最少代码: program test
java - jsp。枚举比较/相等
我有以下 jsp 片段: ${campaign.moderated}
python - 如何测试两个稀疏数组是否(几乎)相等？
我想检查两个稀疏数组是否(几乎)相等。而对于 numpy 数组，你可以这样做: import numpy as np a = np.ones(200) np.testing.assert_array_
c# - 自定义对象的值是否与 List<> 相等
我有以下类(class): public class MyDocuments { public DateTime registeredDate; public
c - for循环中两个 float 相等
这个问题已经有答案了: Is floating point math broken? (33 个回答) 已关闭 5 年前。我在这里想做的是，我采用一个精度值(小于 1)并打印 1/n 类型的所有数字
C 检查字符串的最后一个字符是否与 X 相等
我正在为我的arduino写一个草图，我想检查我的字符串的最后一个字符。例如: 如果输入是 cats- 我想看看最后一个字符(在我的例子中是“-”)实际上是否 - 我使用的代码: 串行事件函数 vo
c# - RuntimeMethodInfo 相等 : bug?
让我们开始: using System; public class Program { class A { public virtual void Do() { }
java - 仅根据某些键将两个 HashMap 相等
我只需要根据几个键(不是全部)来确定两个 HashMap 的相等性除了单独访问每个字段并比较相等性之外，还有其他节省时间的方法吗？最佳答案我能想到的一种方法是在您的 HashMap 上存储某种“
java - 测试是否与 double 相等
在Java中，大写的Double可以为null。但是如果我有 double a 和 b 并且我这样做: if (a.equals(b)) 如果其中之一为空，它会崩溃。有没有更好的方法来比较它们？最
mysql - 使用选择选项从另一个数据库插入并且两个数据库表 ID 相等
我正在尝试从我的旧数据库中插入表格数据。 Id 在数据库表和选择特定列中都相等。这是我的数据库。旧数据库:sch -> 旧表:product (id, tag, url) (13, red, aaa
css - 在容器内拆分两个 div 相等
我正在开发一个应用程序，它在我的主视图中有一个侧边栏和两个 div。我试图在容器内平均分割两者的高度。我试过 height = 50% 但效果不太好。

塔克拉玛干

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 将数字列表拆分为 n 个 block ，使这些 block 具有(接近)相等的总和并保持原始顺序

输出

编辑

附录