python - 将数字列表拆分为 n 个 block ，使这些 block 具有(接近)相等的总和并保持原始顺序

转载作者：太空狗更新时间：2023-10-29 17:43:56

这不是标准的分区问题，因为我需要维护列表中元素的顺序。

例如，如果我有一个列表

[1, 6, 2, 3, 4, 1, 7, 6, 4]

我想要两个 block ，那么分割应该给

[[1, 6, 2, 3, 4, 1], [7, 6, 4]]

每边总和为 17。对于三个 block ，结果将是

[[1, 6, 2, 3], [4, 1, 7], [6, 4]]

对于 12、12 和 10 的总和。

编辑以获取更多解释

我目前将总和除以 block 数并将其用作目标，然后迭代直到接近该目标。问题是某些数据集可能会搞乱算法，例如试图将以下内容分成 3 份:-

[95, 15, 75, 25, 85, 5]

总和为 300，目标为 100。第一个 block 的总和为 95，第二个总和为 90，第三个总和为 110，5 为“剩余”。将它附加到它应该出现的位置会得到 95、90、115，其中更“合理”的解决方案是 110、100、90。

结束编辑

背景:

我有一个包含不同高度的文本(歌词)的列表，我想将文本分成任意数量的列。目前我根据所有行的总高度计算目标高度，但显然这是一个一致的低估，在某些情况下会导致次优解决方案(最后一列明显更高)。

最佳答案

此方法定义分区边界，将数组划分为大致相等数量的元素，然后反复搜索更好的分区，直到找不到更多为止。它不同于大多数其他已发布的解决方案，因为它希望通过尝试多个不同的分区来找到最佳解决方案。其他解决方案试图通过阵列单次传递创建良好的分区，但我想不出保证最优的单次传递算法。

此处的代码是此算法的有效实现，但可能难以理解，因此在末尾包含一个更具可读性的版本作为附录。

def partition_list(a, k):
    if k <= 1: return [a]
    if k >= len(a): return [[x] for x in a]
    partition_between = [(i+1)*len(a)/k for i in range(k-1)]
    average_height = float(sum(a))/k
    best_score = None
    best_partitions = None
    count = 0

    while True:
        starts = [0]+partition_between
        ends = partition_between+[len(a)]
        partitions = [a[starts[i]:ends[i]] for i in range(k)]
        heights = map(sum, partitions)

        abs_height_diffs = map(lambda x: abs(average_height - x), heights)
        worst_partition_index = abs_height_diffs.index(max(abs_height_diffs))
        worst_height_diff = average_height - heights[worst_partition_index]

        if best_score is None or abs(worst_height_diff) < best_score:
            best_score = abs(worst_height_diff)
            best_partitions = partitions
            no_improvements_count = 0
        else:
            no_improvements_count += 1

        if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
            return best_partitions
        count += 1

        move = -1 if worst_height_diff < 0 else 1
        bound_to_move = 0 if worst_partition_index == 0\
                        else k-2 if worst_partition_index == k-1\
                        else worst_partition_index-1 if (worst_height_diff < 0) ^ (heights[worst_partition_index-1] > heights[worst_partition_index+1])\
                        else worst_partition_index
        direction = -1 if bound_to_move < worst_partition_index else 1
        partition_between[bound_to_move] += move * direction

def print_best_partition(a, k):
    print 'Partitioning {0} into {1} partitions'.format(a, k)
    p = partition_list(a, k)
    print 'The best partitioning is {0}\n    With heights {1}\n'.format(p, map(sum, p))

a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2) 
print_best_partition(a, 3)
print_best_partition(a, 4)

b = [1, 10, 10, 1]
print_best_partition(b, 2)

import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)

d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)

根据您的用途，可能需要进行一些修改。例如，为了确定是否找到了最佳分区，当分区之间没有高度差时，该算法停止，它没有找到比连续 5 次以上迭代看到的最好的东西更好的东西，或者在 100 次之后总迭代次数作为一个包罗万象的停止点。您可能需要调整这些常量或使用不同的方案。如果您的高度形成了一个复杂的值(value)观景观，知道何时停止可能会遇到试图逃避局部最大值等经典问题。

输出

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 1 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1, 7, 6, 4]]
With heights [34]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 2 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1], [7, 6, 4]]
With heights [17, 17]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 3 partitions
The best partitioning is [[1, 6, 2, 3], [4, 1, 7], [6, 4]]
With heights [12, 12, 10]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 4 partitions
The best partitioning is [[1, 6], [2, 3, 4], [1, 7], [6, 4]]
With heights [7, 9, 8, 10]

Partitioning [1, 10, 10, 1] into 2 partitions
The best partitioning is [[1, 10], [10, 1]]
With heights [11, 11]

Partitioning [7, 17, 17, 1, 8, 8, 12, 0, 10, 20, 17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9, 12, 3, 18, 9, 6, 7, 19, 20, 17, 7, 4, 3, 16, 20, 6, 7, 12, 16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16, 14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5, 13, 16, 0, 16, 7, 3, 8, 1, 20, 16, 11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18, 20, 3, 10, 9, 13, 12, 15, 6, 14, 16, 6, 12, 9, 9, 16, 14, 19, 1] into 10 partitions
The best partitioning is [[7, 17, 17, 1, 8, 8, 12, 0, 10, 20], [17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9], [12, 3, 18, 9, 6, 7, 19, 20], [17, 7, 4, 3, 16, 20, 6, 7, 12], [16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16], [14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5], [13, 16, 0, 16, 7, 3, 8, 1, 20, 16], [11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18], [20, 3, 10, 9, 13, 12, 15, 6, 14], [16, 6, 12, 9, 9, 16, 14, 19, 1]]
With heights [100, 95, 94, 92, 90, 87, 100, 93, 102, 102]

Partitioning [95, 15, 75, 25, 85, 5] into 3 partitions
The best partitioning is [[95, 15], [75, 25], [85, 5]]
With heights [110, 100, 90]

编辑

添加了新的测试用例 [95, 15, 75, 25, 85, 5]，此方法可以正确处理。

附录

此版本的算法更易于阅读和理解，但由于较少利用内置的 Python 功能，因此有点长。然而，它的执行时间似乎相当，甚至稍快。

#partition list a into k partitions
def partition_list(a, k):
    #check degenerate conditions
    if k <= 1: return [a]
    if k >= len(a): return [[x] for x in a]
    #create a list of indexes to partition between, using the index on the
    #left of the partition to indicate where to partition
    #to start, roughly partition the array into equal groups of len(a)/k (note
    #that the last group may be a different size) 
    partition_between = []
    for i in range(k-1):
        partition_between.append((i+1)*len(a)/k)
    #the ideal size for all partitions is the total height of the list divided
    #by the number of paritions
    average_height = float(sum(a))/k
    best_score = None
    best_partitions = None
    count = 0
    no_improvements_count = 0
    #loop over possible partitionings
    while True:
        #partition the list
        partitions = []
        index = 0
        for div in partition_between:
            #create partitions based on partition_between
            partitions.append(a[index:div])
            index = div
        #append the last partition, which runs from the last partition divider
        #to the end of the list
        partitions.append(a[index:])
        #evaluate the partitioning
        worst_height_diff = 0
        worst_partition_index = -1
        for p in partitions:
            #compare the partition height to the ideal partition height
            height_diff = average_height - sum(p)
            #if it's the worst partition we've seen, update the variables that
            #track that
            if abs(height_diff) > abs(worst_height_diff):
                worst_height_diff = height_diff
                worst_partition_index = partitions.index(p)
        #if the worst partition from this run is still better than anything
        #we saw in previous iterations, update our best-ever variables
        if best_score is None or abs(worst_height_diff) < best_score:
            best_score = abs(worst_height_diff)
            best_partitions = partitions
            no_improvements_count = 0
        else:
            no_improvements_count += 1
        #decide if we're done: if all our partition heights are ideal, or if
        #we haven't seen improvement in >5 iterations, or we've tried 100
        #different partitionings
        #the criteria to exit are important for getting a good result with
        #complex data, and changing them is a good way to experiment with getting
        #improved results
        if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
            return best_partitions
        count += 1
        #adjust the partitioning of the worst partition to move it closer to the
        #ideal size. the overall goal is to take the worst partition and adjust
        #its size to try and make its height closer to the ideal. generally, if
        #the worst partition is too big, we want to shrink the worst partition
        #by moving one of its ends into the smaller of the two neighboring
        #partitions. if the worst partition is too small, we want to grow the
        #partition by expanding the partition towards the larger of the two
        #neighboring partitions
        if worst_partition_index == 0:   #the worst partition is the first one
            if worst_height_diff < 0: partition_between[0] -= 1   #partition too big, so make it smaller
            else: partition_between[0] += 1   #partition too small, so make it bigger
        elif worst_partition_index == len(partitions)-1: #the worst partition is the last one
            if worst_height_diff < 0: partition_between[-1] += 1   #partition too small, so make it bigger
            else: partition_between[-1] -= 1   #partition too big, so make it smaller
        else:   #the worst partition is in the middle somewhere
            left_bound = worst_partition_index - 1   #the divider before the partition
            right_bound = worst_partition_index   #the divider after the partition
            if worst_height_diff < 0:   #partition too big, so make it smaller
                if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]):   #the partition on the left is bigger than the one on the right, so make the one on the right bigger
                    partition_between[right_bound] -= 1
                else:   #the partition on the left is smaller than the one on the right, so make the one on the left bigger
                    partition_between[left_bound] += 1
            else:   #partition too small, make it bigger
                if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the left smaller
                    partition_between[left_bound] -= 1
                else:   #the partition on the left is smaller than the one on the right, so make the one on the right smaller
                    partition_between[right_bound] += 1

def print_best_partition(a, k):
    #simple function to partition a list and print info
    print '    Partitioning {0} into {1} partitions'.format(a, k)
    p = partition_list(a, k)
    print '    The best partitioning is {0}\n    With heights {1}\n'.format(p, map(sum, p))

#tests
a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2) 
print_best_partition(a, 3)
print_best_partition(a, 4)
print_best_partition(a, 5)

b = [1, 10, 10, 1]
print_best_partition(b, 2)

import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)

d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)

关于python - 将数字列表拆分为 n 个 block ，使这些 block 具有(接近)相等的总和并保持原始顺序，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35517051/

文章推荐： python - 设置 EB CLI - 错误 nonetype get_frozen_credentials

文章推荐： c# - 使用枚举实现层次结构的最佳 C# 模式是什么？

文章推荐： C# - ConfigurationSection isRequired 属性

文章推荐： python - 来自 matplotlib 的动画在 spyder 中不起作用

java - 当 x 接近 0 时，使 z 接近 100
我想创建一个返回值的方法(我们称之为“z”)。它的值由另一个值决定(我们称之为“y”)。基本上我想要的是满足以下条件: 当 x 接近 0 时，z 接近 100。当 x 接近无穷大时，z 接近 0。
java - 尝试执行PreparedStatement时出现MySQLSyntaxErrorException 接近 "?"
我正在尝试使用 Java 中的PreparedStatement 执行查询。当我尝试执行查询时，收到错误号 1064(语法错误)。我已经在 MySQL 查询浏览器中使用替换值对此进行了测试，效果很
parsing - 接近 Scala 中的文本解析
我正在开发一个应用程序来解析 Scala 中的命令。命令的一个例子是: todo get milk for friday 所以计划是让一个非常智能的解析器将行分开并识别命令部分以及字符串中有时间引用的
java - dwr cometd 接近
来自 http://directwebremoting.org/dwr/reverse-ajax/index.html ，它表示它支持轮询、 cometd 、搭载。这是否意味着当我们实现这种方法时，我
android - 接近 2D 位置
我开始研究一个概念，该概念要求我找到一种方法，以给定的速度将矩形移向给定的点。我正在为 Android 开发，所以这对速度非常关键(它也将针对可能的数百个对象计算每一帧。) 我能想到的解决方案如下:
MySQL 根据字段值忽略(接近)重复项
我正在处理一个处理“门票”的表(状态=“开放”或状态=“关闭”)。当票证关闭时，相关系统不会更改状态，而是会创建一个具有“已关闭”状态的重复条目。对于“ticket_number”关键字段，如果存在
python - 接近 MySQL 中的串行文本文件读取性能
我正在尝试在 python 中执行一些 n-gram 计数，我想我可以使用 MySQL(MySQLdb 模块)来组织我的文本数据。我有一个很大的表，大约有 1000 万条记录，代表由唯一数字 ID(
python:操作错误:接近 "%":语法错误
我正在尝试将数据添加到 mariadb 表中。我想将 val0 到 val5 作为查询的值传递。但我收到错误 OperationalError: close "%": 语法错误代码 list_Valu
python - 接近 1 的纹理坐标表现异常
我正在使用 (Py)OpenGL 显示 256 色索引图像。我将着色器与包含调色板的一维纹理一起使用。这是片段着色器代码: #version 330 uniform sampler2D texture
css - 接近 CSS 变换比例的极限
对于我的元素 areallybigpage.com (*)，我想看看我们能用 CSS 的 transform: scale(...) 走多远。这有效并以正常大小显示文本: #id1 { positi
python - 接近 0.05 的舍入从结果中删除一位
我有两列带有数字数据的 Pandas 表(dtype flaot64)。我将每列四舍五入到小数点后有 2 位数字，然后使用函数将其四舍五入到接近 0.5，但由于某种原因，只有一列四舍五入为 0.05
java - 当我测试我的应用程序时，保持 Force 接近
我正在构建一个由用户登录和注册组成的应用程序，但每次我在模拟器上测试它时，我都会收到强制关闭。以下是我在日志猫中收到的错误: 08-14 14:06:28.853: D/dalvikvm(828):
python - Strassen 矩阵乘法——接近，但仍然存在错误
我正在尝试在 Python 中实现 Strassen 矩阵乘法。我已经让它发挥了一些作用。这是我的代码: a = [[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]] b
c# - Sql 查询帮助，接近 = 的语法错误
为什么这不起作用？这与 = 附近的命令字符串语法有关，但我似乎无法弄清楚，在线示例似乎完全相同。编辑: Activated In 是一列。示例来自 How to select data from d
ios - 接近 CLLocationCoordinate2D 时发出警告 - 怎么样？
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。要求提供代码的问题必须表现出对所解决问题的最低限度理解。包括尝试过的解决方案、为什么它们不起作用，以及
接近 100 个单词的句号后分页的 Php 代码
我有一个测试区，它是来自数据库的动态文本，可能有数千个单词。我希望它中断并在每段中用句号将近 100 个(任意长度)单词作为一个段落。我能够在 100 个单词后中断，但不能完全停止。为了在 100 个
scala - 在加载语句中期望 StringLiteral 接近 'inpath'
我是 hadoop 和 hive 的新手。我正在尝试将数据加载到配置单元表中，但遇到以下错误。另一方面，我尝试使用语句 stmt.execute("INSERT INTO employee VALU
haskell - 函数无法处理较大的 n 值(接近 400)
这是来自一个统计项目。我定义了下面的函数，但是当n接近400时，第二个方法很慢。第一个方法很好(这里有人帮助了我in this question) import Math.Combinatorics.
javascript - 当侧边菜单 div 接近 0px 时导航链接不隐藏
我正在尝试创建一个 css 侧边菜单，但是当我关闭菜单并将 div 容器宽度设置为 0 时，链接仍然可见。这是 jsfiddle - https://jsfiddle.net/atLvp6k7/ 有
在没有参数的情况下调用存储过程时，Mysql 错误 #1064 接近 Null
我对 MySQL 还很陌生。我必须使用输出参数调用存储过程。我在互联网上搜索了很多，但没有找到解决我的问题的正确方法。如果我使用 @outputParamName 调用存储过程，它会说我在 NULL

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 将数字列表拆分为 n 个 block ，使这些 block 具有(接近)相等的总和并保持原始顺序

输出

编辑

附录