gpt4 book ai didi

python - Python 中使用 gekko 的 MLE 应用程序

转载 作者:行者123 更新时间:2023-12-02 18:36:42 29 4
gpt4 key购买 nike

我想在 python 中使用 gekko 包实现 MLE(最大似然估计)。假设我们有一个包含两列的 DataFrame:['Loss', 'Target'] 并且其长度等于 500。
首先我们必须导入我们需要的包:

from gekko import GEKKO
import numpy as np
import pandas as pd

然后我们简单地创建DataFrame,如下所示:

My_DataFrame = pd.DataFrame({"Loss":np.linspace(-555.795 , 477.841 , 500) , "Target":0.0})
My_DataFrame = My_DataFrame.sort_values(by=["Loss"] , ascending=False).reset_index(drop=True)
My_DataFrame

它看起来像这样:
enter image description here

['Target']列的某些组成部分应该使用我在图片下方写下的公式进行计算(其余部分保持为零。我在继续中解释了更多内容,请继续阅读),这样您就可以完美地看到它。配方的两个主要元素是“Kasi”和“Betaa”。我想为它们找到最大值(value),使 My_DataFrame['Target'] 的总和最大化。所以你知道了将会发生什么!

enter image description here

现在让我向您展示我是如何为此目的编写代码的。首先,我定义我的目标函数:

def obj_function(Array):
"""
[Purpose]:
+ it will calculate each component of My_DataFrame["Target"] column! then i can maximize sum(My_DataFrame["Target"]) and find best 'Kasi' and 'Betaa' for it!

[Parameters]:
+ This function gets Array that contains 'Kasi' and 'Betaa'.
Array[0] represents 'Kasi' and Array[1] represents 'Betaa'

[returns]:
+ returns a pandas.series.
actually it returns new components of My_DataFrame["Target"]
"""
# in following code if you don't know what is `qw`, just look at the next code cell right after this cell (I mean next section).
# in following code np.where(My_DataFrame["Loss"] == item)[0][0] is telling me the row's index of item.
for item in My_DataFrame[My_DataFrame["Loss"]>160]['Loss']:
My_DataFrame.iloc[np.where(My_DataFrame["Loss"] == item)[0][0] , 1] = qw.log10((1/Array[1])*( 1 + (Array[0]*(item-160)/Array[1])**( (-1/Array[0]) - 1 )))

return My_DataFrame["Target"]

如果您对 obj_function 函数中的 for 循环 中发生的事情感到困惑,请查看下面的图片,它包含一个简短的示例!如果没有,请跳过这部分:

enter image description here

那么我们只需要进行优化即可。我使用 gekko 包来实现此目的。 请注意我想找到“Kasi”和“Betaa”的最佳值,所以我有两个主要变量,并且没有任何限制!那么让我们开始吧:

# i have 2 variables : 'Kasi' and 'Betaa', so I put nd=2
nd = 2
qw = GEKKO()

# now i want to specify my variables ('Kasi' and 'Betaa') with initial values --> Kasi = 0.7 and Betaa = 20.0
x = qw.Array(qw.Var , nd , value = [0.7 , 20])
# So i guess now x[0] represents 'Kasi' and x[1] represents 'Betaa'

qw.Maximize(np.sum(obj_function(x)))

然后当我想使用 qw.solve() 求解优化时:

qw.solve()

但是我收到了这个错误:

Exception: This steady-state IMODE only allows scalar values.

如何解决这个问题? (为了方便起见,完整的脚本收集在下一节中)

from gekko import GEKKO
import numpy as np
import pandas as pd


My_DataFrame = pd.DataFrame({"Loss":np.linspace(-555.795 , 477.841 , 500) , "Target":0.0})
My_DataFrame = My_DataFrame.sort_values(by=["Loss"] , ascending=False).reset_index(drop=True)

def obj_function(Array):
"""
[Purpose]:
+ it will calculate each component of My_DataFrame["Target"] column! then i can maximize sum(My_DataFrame["Target"]) and find best 'Kasi' and 'Betaa' for it!

[Parameters]:
+ This function gets Array that contains 'Kasi' and 'Betaa'.
Array[0] represents 'Kasi' and Array[1] represents 'Betaa'

[returns]:
+ returns a pandas.series.
actually it returns new components of My_DataFrame["Target"]
"""
# in following code if you don't know what is `qw`, just look at the next code cell right after this cell (I mean next section).
# in following code np.where(My_DataFrame["Loss"] == item)[0][0] is telling me the row's index of item.
for item in My_DataFrame[My_DataFrame["Loss"]>160]['Loss']:
My_DataFrame.iloc[np.where(My_DataFrame["Loss"] == item)[0][0] , 1] = qw.log10((1/Array[1])*( 1 + (Array[0]*(item-160)/Array[1])**( (-1/Array[0]) - 1 )))

return My_DataFrame["Target"]



# i have 2 variables : 'Kasi' and 'Betaa', so I put nd=2
nd = 2
qw = GEKKO()

# now i want to specify my variables ('Kasi' and 'Betaa') with initial values --> Kasi = 0.7 and Betaa = 20.0
x = qw.Array(qw.Var , nd)
for i,xi in enumerate([0.7, 20]):
x[i].value = xi
# So i guess now x[0] represents 'Kasi' and x[1] represents 'Betaa'

qw.Maximize(qw.sum(obj_function(x)))

建议的潜在脚本在这里:

from gekko import GEKKO
import numpy as np
import pandas as pd


My_DataFrame = pd.read_excel("[<FILE_PATH_IN_YOUR_MACHINE>]\\Losses.xlsx")
# i'll put link of "Losses.xlsx" file in the end of my explaination
# so you can download it from my google drive.


loss = My_DataFrame["Loss"]
def obj_function(x):
k,b = x
target = []

for iloss in loss:
if iloss>160:
t = qw.log((1/b)*(1+(k*(iloss-160)/b)**((-1/k)-1)))
target.append(t)
return target


qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)

# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
x[i].value = xi

# bounds
k,b = x
k.lower=0.1; k.upper=0.8
b.lower=10; b.upper=500
qw.Maximize(qw.sum(obj_function(x)))
qw.options.SOLVER = 1
qw.solve()
print('k = ',k.value[0])
print('b = ',b.value[0])

python 输出:

objective function = -1155.4861315885942
b = 500.0
k = 0.1

请注意,在 python 输出中,b 代表“Betaa”,k 代表“Kasi”。
输出看起来有点奇怪,所以我决定测试它!为此我使用了Microsoft Excel Solver!
(我把excel文件的链接放在我的解释的最后,这样你可以自己检查一下)如下图所示,excel优化已经完成,最优解已成功找到(优化结果见下图)。
enter image description here

Excel 输出:

objective function = -108.21
Betaa = 32.53161
Kasi = 0.436246

正如您所看到的,python 输出excel 输出 之间存在巨大差异,而且 excel 的性能似乎相当不错! 所以我猜问题仍然存在,并且提议的 python 脚本性能不佳...
Optimization by Microsoft Excel 应用程序的 Implementation_in_Excel.xls 文件可用 here (您还可以在“数据”选项卡-->“分析”-->“Slover”中看到优化选项。)
excel和python中用于优化的数据是相同的并且可用here (它非常简单,包含 501 行和 1 列)。
*如果您无法下载文件,请告诉我,我会更新它们。

最佳答案

初始化是将[0.7, 20]的值应用于每个参数。应使用标量来初始化 value,例如:

x = qw.Array(qw.Var , nd)
for i,xi in enumerate([0.7, 20]):
x[i].value = xi

另一个问题是,gekko 需要使用特殊函数来为求解器执行自动微分。对于目标函数,切换到 gekko 版本的求和:

qw.Maximize(qw.sum(obj_function(x)))

如果通过更改 x 的值来计算损失,则目标函数为 logical expressions that need special treatment用于使用基于梯度的求解器进行求解。尝试使用 if3() 函数作为条件语句,否则 slack variables (首选)。目标函数被评估一次以构建符号表达式,然后将其编译为字节码并使用其中一个求解器进行求解。符号表达式位于 gk0_model.apm 文件的 m.path 中。

回复编辑

感谢您发布包含完整代码的编辑。这是一个可能的解决方案:

from gekko import GEKKO
import numpy as np
import pandas as pd

loss = np.linspace(-555.795 , 477.841 , 500)
def obj_function(x):
k,b = x
target = []

for iloss in loss:
if iloss>160:
t = qw.log((1/b)*(1+(k*(iloss-160)/b)**((-1/k)-1)))
target.append(t)
return target
qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)
# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
x[i].value = xi
# bounds
k,b = x
k.lower=0.6; k.upper=0.8
b.lower=10; b.upper=30
qw.Maximize(qw.sum(obj_function(x)))
qw.options.SOLVER = 1
qw.solve()
print('k = ',k.value[0])
print('b = ',b.value[0])

求解器达到了解的边界。可能需要扩大界限,这样任意限制就不是解决方案。


更新

这是最终的解决方案。代码中的目标函数有问题,因此应该修复它,这是正确的脚本:

from gekko import GEKKO
import numpy as np
import pandas as pd

My_DataFrame = pd.read_excel("<FILE_PATH_IN_YOUR_MACHINE>\\Losses.xlsx")
loss = My_DataFrame["Loss"]

def obj_function(x):
k,b = x
q = ((-1/k)-1)
target = []

for iloss in loss:
if iloss>160:
t = qw.log(1/b) + q* ( qw.log(b+k*(iloss-160)) - qw.log(b))
target.append(t)
return target

qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)

# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
x[i].value = xi

qw.Maximize(qw.sum(obj_function(x)))
qw.solve()
print('Kasi = ',x[0].value)
print('Betaa = ',x[1].value)

输出:

 The final value of the objective function is  108.20609317143486

---------------------------------------------------
Solver : IPOPT (v3.12)
Solution time : 0.031200000000000006 sec
Objective : 108.20609317143486
Successful solution
---------------------------------------------------


Kasi = [0.436245842]
Betaa = [32.531632983]

结果接近 Microsoft Excel 的优化结果。

关于python - Python 中使用 gekko 的 MLE 应用程序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68678285/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com