python - Python中Pareto前沿的快速计算-6ren

python - Python中Pareto前沿的快速计算

转载作者：太空狗更新时间：2023-10-29 17:02:21

我在 3D 空间中有一组点，我需要从中找到 Pareto 边界。执行速度在这里非常重要，随着我加分测试，时间增加得非常快。

点集如下所示:

[[0.3296170319979843, 0.0, 0.44472108843537406], [0.3296170319979843,0.0, 0.44472108843537406], [0.32920760896951373, 0.0, 0.4440408163265306], [0.32920760896951373, 0.0, 0.4440408163265306], [0.33815192743764166, 0.0, 0.44356462585034007]]

现在，我正在使用这个算法:

def dominates(row, candidateRow):
    return sum([row[x] >= candidateRow[x] for x in range(len(row))]) == len(row) 

def simple_cull(inputPoints, dominates):
    paretoPoints = set()
    candidateRowNr = 0
    dominatedPoints = set()
    while True:
        candidateRow = inputPoints[candidateRowNr]
        inputPoints.remove(candidateRow)
        rowNr = 0
        nonDominated = True
        while len(inputPoints) != 0 and rowNr < len(inputPoints):
            row = inputPoints[rowNr]
            if dominates(candidateRow, row):
                # If it is worse on all features remove the row from the array
                inputPoints.remove(row)
                dominatedPoints.add(tuple(row))
            elif dominates(row, candidateRow):
                nonDominated = False
                dominatedPoints.add(tuple(candidateRow))
                rowNr += 1
            else:
                rowNr += 1

        if nonDominated:
            # add the non-dominated point to the Pareto frontier
            paretoPoints.add(tuple(candidateRow))

        if len(inputPoints) == 0:
            break
    return paretoPoints, dominatedPoints

在这里找到:http://code.activestate.com/recipes/578287-multidimensional-pareto-front/

在一组解决方案中找到一组非支配解决方案的最快方法是什么？或者，简而言之，Python 能比这个算法做得更好吗？

最佳答案

如果您担心实际速度，您肯定想使用 numpy(因为巧妙的算法调整可能比使用数组操作获得的 yield 要小得多)。以下是三个计算相同函数的解决方案。 is_pareto_efficient_dumb 解决方案在大多数情况下较慢，但随着成本数量的增加而变得更快，is_pareto_efficient_simple 解决方案在许多点上比 dumb 解决方案更有效，最终is_pareto_efficient 函数可读性较差但速度最快(因此所有都是 Pareto 有效的!)。

import numpy as np


# Very slow for many datapoints.  Fastest for many costs, most readable
def is_pareto_efficient_dumb(costs):
    """
    Find the pareto-efficient points
    :param costs: An (n_points, n_costs) array
    :return: A (n_points, ) boolean array, indicating whether each point is Pareto efficient
    """
    is_efficient = np.ones(costs.shape[0], dtype = bool)
    for i, c in enumerate(costs):
        is_efficient[i] = np.all(np.any(costs[:i]>c, axis=1)) and np.all(np.any(costs[i+1:]>c, axis=1))
    return is_efficient


# Fairly fast for many datapoints, less fast for many costs, somewhat readable
def is_pareto_efficient_simple(costs):
    """
    Find the pareto-efficient points
    :param costs: An (n_points, n_costs) array
    :return: A (n_points, ) boolean array, indicating whether each point is Pareto efficient
    """
    is_efficient = np.ones(costs.shape[0], dtype = bool)
    for i, c in enumerate(costs):
        if is_efficient[i]:
            is_efficient[is_efficient] = np.any(costs[is_efficient]<c, axis=1)  # Keep any point with a lower cost
            is_efficient[i] = True  # And keep self
    return is_efficient


# Faster than is_pareto_efficient_simple, but less readable.
def is_pareto_efficient(costs, return_mask = True):
    """
    Find the pareto-efficient points
    :param costs: An (n_points, n_costs) array
    :param return_mask: True to return a mask
    :return: An array of indices of pareto-efficient points.
        If return_mask is True, this will be an (n_points, ) boolean array
        Otherwise it will be a (n_efficient_points, ) integer array of indices.
    """
    is_efficient = np.arange(costs.shape[0])
    n_points = costs.shape[0]
    next_point_index = 0  # Next index in the is_efficient array to search for
    while next_point_index<len(costs):
        nondominated_point_mask = np.any(costs<costs[next_point_index], axis=1)
        nondominated_point_mask[next_point_index] = True
        is_efficient = is_efficient[nondominated_point_mask]  # Remove dominated points
        costs = costs[nondominated_point_mask]
        next_point_index = np.sum(nondominated_point_mask[:next_point_index])+1
    if return_mask:
        is_efficient_mask = np.zeros(n_points, dtype = bool)
        is_efficient_mask[is_efficient] = True
        return is_efficient_mask
    else:
        return is_efficient

分析测试(使用从正态分布中抽取的点):

有 10,000 个样本，2 个成本:

is_pareto_efficient_dumb: Elapsed time is 1.586s
is_pareto_efficient_simple: Elapsed time is 0.009653s
is_pareto_efficient: Elapsed time is 0.005479s

有 1,000,000 个样本，2 个成本:

is_pareto_efficient_dumb: Really, really, slow
is_pareto_efficient_simple: Elapsed time is 1.174s
is_pareto_efficient: Elapsed time is 0.4033s

有 10,000 个样本，15 个成本:

is_pareto_efficient_dumb: Elapsed time is 4.019s
is_pareto_efficient_simple: Elapsed time is 6.466s
is_pareto_efficient: Elapsed time is 6.41s

请注意，如果效率是一个问题，您可以通过预先对数据重新排序来获得大约 2 倍左右的加速，请参阅 here .

关于python - Python中Pareto前沿的快速计算，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32791911/

文章推荐： Angular2 - npm install 没有找到命名空间 webdriver

文章推荐： Angular 2 ngModelChange旧值

文章推荐： python - 五个骰子的优雅迭代

c++ - 从 Boost Pareto 分布中随机生成的数字
所以在问这个问题之前，我做了很多关于堆栈溢出和谷歌的调查。我有一个正在进行的模拟，需要能够生成 0 到 1.0 之间的随机输入数据，这些数据遵循特定的统计分布。到目前为止，我已经得到了正常工作的正态
python - DataFrame 多目标排序定义 Pareto 边界
Pandas 是否内置了多目标排序算法？我找到了 this这是一个 NSGA-II 算法(这是我想要的)，但它需要将目标函数作为单独的文件传递。在一个理想的世界中，我会为所有数据使用 DataFra
python - 继承问题 scipy.stats.pareto
我有以下类(我将用它来扩展现有类scipy.stats.pareto): class pareto(scipy.stats.pareto): def __init__(self, b):
r - 如何在 R 中从具有 Pareto 尾部的对数正态分布生成样本？
我想从 R 中具有 Pareto 尾部的对数正态分布生成样本。有人可以帮助我吗？谢谢。最佳答案我不确定这就是您要找的内容，但是有大量关于双帕累托对数正态分布或 so-ca.led dPlN 主题的
python - 无法正确识别 4 维解空间中的 Pareto 观测值
我正在使用具有 4 个适应度函数的遗传算法。我有一个文件，其中每一行都列出了单个解决方案的适应度值。遗传算法的目的是最大化这些值。我需要从文件中列出的解决方案中找出非支配解决方案。我有一个名为“id
r - 如何将 abline() 添加到 pareto.chart()/barplot()？
我正在使用帕累托图(库qcc): ## abcd is a data frame, and I am producing chart for column products Product str(
database - 没有迭代过程和有效值硬编码的 Oracle SQL 中的 Pareto 分析
假设，您有表 T(C1, C2, C3)，并且 C1 有一组有效值，V={v1,v2,v3...}。现在，无需诉诸游标 - 即，完全停留在基于集合的逻辑域中，您想要查看类型为 v1 的行所占的比例，类
python - 如何在 R/Python 中的 Pareto/NBD 模型中合并协变量？
如何在 R/Python 中创建 Pareto/NBD 模型来衡量 CLV(客户生命周期值(value))，其中包括时不变协变量的影响？到目前为止，我已经使用 R 中的 BYTD 包和 Python
r - 如何使用 ggplot2 从 qcc 包重现 pareto.chart 图？
我一直在使用 R 中 qcc 包中的 pareto.chart 函数，我真的很喜欢它。现在我想移植所有图形以使用 ggplot2 包。然而，尽管有出色的文档，但我对 ggplot2 的了解非常有限，因
sql-server-2005 - SQL Server 报告服务 : creating a Pareto chart
我正在尝试在 SQL Server Reporting Service 2005 中创建帕累托图。我创建了一个图表，但在尝试显示我的累积(线)时遇到困难。我在下面列出了我的值(value)观。 =SU
javascript - 如何从 JavaScript 的 Math.random 生成包含边界的 Pareto 随机整数
来自 Wikipedia : 我试图实现这个公式，即使它基于使用均匀分布在 (0,1) 上的随机数并生成分数，而 JavaScript 中的 Math.random 生成 [0,1) 上的数字，我是寻
r - 在 R 中构建 Pareto/NBD 模型时出现 cbind 错误(BTYD 包)
我正在使用 BTYD 包构建 CLV 模型，但我遇到了一个我似乎无法绕过的障碍。我一直在认真遵循 this tutorial 中第 2.1 节的指示。 .一切似乎都很顺利，直到我到达校准客户充分统计

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - Python中Pareto前沿的快速计算