python - 将分布拟合到数据 : how to penalize "bad" parameter estimates?-6ren

python - 将分布拟合到数据 : how to penalize "bad" parameter estimates?

转载作者：行者123 更新时间：2023-11-28 18:45:55

32

4

我正在使用 scipy's least-squares optimization适合 exponentially-modified gaussian distribution到一组 react 时间测量。一般来说，它运行良好，但有时，优化会偏离轨道并为参数选择一个疯狂的值——结果图显然不能很好地拟合数据。一般来说，问题似乎是由浮点精度错误引起的——我们转向 0、inf 或 nan-land。

我正在考虑做两件事:

使用参数同时对数据进行 CDF 和 PDF 拟合；我有两个公式。 (我正在使用核密度估计来近似数据中的 PDF。)
以某种方式考虑与初始参数估计值的距离(通过维基百科页面上的矩方法生成)。这些估计远非完美，但相当不错，并且似乎避开了“ float 爆炸”问题。

结合 PDF 和 CDF 拟合听起来非常简单；误差的范围甚至大体相同。但是让初始参数适合那里:我不太确定这是否是个好主意——但如果是:

我会如何处理规模差异？我应该将参数“错误”标准化为百分比错误吗？
是否有合理的方法来确定数据估计误差和参数“误差”之间的相对权重？

这些甚至是正确的问题吗？是否存在普遍认为的“正确”答案，或者“尝试一些东西直到找到似乎有效的东西”是一种好方法吗？

一个示例数据集

根据要求，这里有一个数据集，此过程对其运行不佳。我知道只有几个样本，而且数据不能很好地符合分布；我仍然希望通过优化获得“合理的外观”结果。

array([ 450.,  560.,  692.,  730.,  758.,  723.,  486.,  596.,  716.,
        695.,  757.,  522.,  535.,  419.,  478.,  666.,  637.,  569.,
        859.,  883.,  551.,  652.,  378.,  801.,  718.,  479.,  544.])

MLE 更新

在让我的 MLE 估计收敛到一个“合理”值之前，我遇到了一堆问题，直到我发现这一点:如果 X 至少包含一个 nan，np.sum(X) == nan 当 X 是一个 numpy 数组但不是当 X 是一个 pandas 系列时。因此，当参数开始越界时，对数似然之和正在做愚蠢的事情。

添加了一个 np.asarray() 调用，一切都很好!

最佳答案

这应该是一条评论，但我的空间用完了。

我认为最大似然拟合可能是此处最合适的方法。 ML 方法已经在 scipy.stats 中为许多发行版实现。例如，您可以调用 scipy.stats.norm.fit 找到正态分布的 MLE，并以类似的方式找到指数分布的 MLE。结合这两个生成的 MLE 参数应该可以为 Ex-Gaussian ML 拟合提供一个非常好的起始参数。事实上，我认为你的大部分数据都非常好地呈正态分布。如果是这种情况，仅正态分布的 ML 参数估计值就应该为您提供一个非常好的起始参数。

由于 Ex-Gaussian 只有 3 个参数，我认为 ML 拟合一点也不难。如果您可以提供一个数据集，而您当前的方法不能很好地处理该数据集，那么展示一个真实的例子会更容易。

好的，给你:

>>> import scipy.special as sse
>>> import scipy.stats as sss
>>> import scipy.optimize as so
>>> from numpy import *

>>> def eg_pdf(p, x): #defines the PDF
    m=p[0]
    s=p[1]
    l=p[2]
    return 0.5*l*exp(0.5*l*(2*m+l*s*s-2*x))*sse.erfc((m+l*s*s-x)/(sqrt(2)*s))

>>> xo=array([ 450.,  560.,  692.,  730.,  758.,  723.,  486.,  596.,  716.,
        695.,  757.,  522.,  535.,  419.,  478.,  666.,  637.,  569.,
        859.,  883.,  551.,  652.,  378.,  801.,  718.,  479.,  544.])

>>> sss.norm.fit(xo) #get the starting parameter vector form the normal MLE
(624.22222222222217, 132.23977474531389)

>>> def llh(p, f, x): #defines the negative log-likelihood function
    return -sum(log(f(p,x)))

>>> so.fmin(llh, array([624.22222222222217, 132.23977474531389, 1e-6]), (eg_pdf, xo)) #yeah, the data is not good
Warning: Maximum number of function evaluations has been exceeded.
array([  6.14003407e+02,   1.31843250e+02,   9.79425845e-02])

>>> przt=so.fmin(llh, array([624.22222222222217, 132.23977474531389, 1e-6]), (eg_pdf, xo), maxfun=1000) #so, we increase the number of function call uplimit
Optimization terminated successfully.
         Current function value: 170.195924
         Iterations: 376
         Function evaluations: 681

>>> llh(array([624.22222222222217, 132.23977474531389, 1e-6]), eg_pdf, xo)
400.02921290185645
>>> llh(przt, eg_pdf, xo) #quite an improvement over the initial guess
170.19592431051217
>>> przt
array([  6.14007039e+02,   1.31844654e+02,   9.78934519e-02])

此处使用的优化器(fmin，或 Nelder-Mead 单纯形算法)不使用任何来自梯度的信息，并且通常比使用的优化器运行得慢得多。看来指数高斯的负对数似然函数的导数可以很容易地写成一个封闭的形式。如果是这样，利用梯度/导数的优化器将是更好、更有效的选择(例如 fmin_bfgs)。

要考虑的另一件事是参数约束。根据定义，对于指数高斯分布，sigma 和 lambda 必须为正。您可以使用约束优化器(例如 fmin_l_bfgs_b)。或者，您可以优化:

>>> def eg_pdf2(p, x): #defines the PDF
    m=p[0]
    s=exp(p[1])
    l=exp(p[2])
    return 0.5*l*exp(0.5*l*(2*m+l*s*s-2*x))*sse.erfc((m+l*s*s-x)/(sqrt(2)*s))

由于MLE的函数不变性，这个函数的MLE应该和原来的eg_pdf一样。除了 exp() 之外，您还可以使用其他转换将 (-inf, +inf) 投影到 (0, +inf).

你也可以考虑http://en.wikipedia.org/wiki/Lagrange_multiplier .

关于python - 将分布拟合到数据 : how to penalize "bad" parameter estimates?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20249115/

32

4

0

文章推荐： Python Anagrams 递归

文章推荐： android - React Native - 根据背景颜色更改文本颜色

文章推荐： Python:通过乘法运算符时间复杂度创建列表

文章推荐： python - Jellyfish 的 Damerau–Levenshtein 距离计算有问题吗？

machine-learning - TensorFlow 中 tf.estimator.Estimator 和 tf.contrib.learn.Estimator 有什么区别
几个月前，我使用了tf.contrib.learn.DNNRegressor来自 TensorFlow 的 API，我发现它使用起来非常方便。最近几个月我没有跟上TensorFlow的发展。现在我有一
tensorflow - 使用 tf.estimator.Estimator 加载检查点和微调
我们正在尝试将旧的训练代码转换为更符合 tf.estimator.Estimator 的代码。在初始代码中，我们针对目标数据集微调原始模型。在使用 variables_to_restore 和 ini
python - tf.estimator.Estimator 不记录任何事件文件，张量板不显示任何内容
我目前运行的是 TensorFlow 1.9.0。我的自定义估算器是使用 tf.estimator.Estimator 创建的，并且运行时没有出现任何故障。但是，我在 model_dir 下没有找到任
python - 如何从检查点使用 tf.estimator.Estimator 进行预测？
我刚刚用 tensorflow 训练了一个 CNN 来识别太阳黑子。我的模型与 this 几乎相同.问题是我无法在任何地方找到关于如何使用训练阶段生成的检查点进行预测的明确解释。尝试使用标准恢复方法
python - 使用 tf.estimator.Estimator 框架进行迁移学习
我正在尝试使用我自己的数据集和类对在 imagenet 上预训练的 Inception-resnet v2 模型进行迁移学习。我的原始代码库是对 tf.slim 的修改我再也找不到的示例，现在我正在尝
python - 如何从 tf.estimator.Estimator 获取最后一个 global_step
在 train(...) 完成后，如何从 tf.estimator.Estimator 获取最后一个 global_step ？例如，典型的基于估算器的训练例程可能如下设置: n_epochs = 1
tensorflow - tf.estimator.Estimator.train() 是否保持 input_fn 状态
一年多来我一直在使用自己的 Estimator/Experiment 之类的代码，但我最终想加入 Dataset+Estimator 的行列。我想做如下的事情: for _ in range(N):
python-3.x - 如何将张量板与 tf.estimator.Estimator 一起使用
我正在考虑将我的代码库移动到 tf.estimator.Estimator ，但我找不到如何将它与张量板摘要结合使用的示例。 MWE: import numpy as np import tensor
python - tf.estimator.Estimator.evaluate() 是否总是在一个 GPU 上运行？
我的印象是在 tf.estimator.Estimator 实例上调用 evaluate() 不会在多个 GPU 上运行模型，即使分配策略是 MirroredStrategy，配置为至少使用 2 个
python - 如何使用MonitoredTrainingSession像 `global_step/sec`一样打开日志 `tf.estimator.Estimator`？
我遇到了一些小问题，但我不知道如何处理。当我使用 tf.estimator.Estimator 时，它会在每个步骤中记录两行，例如: INFO:tensorflow:global_step/sec:
python - 如何在 tf.estimator.Estimator() 中记录 tensorflow 层输出
在此tutorial ，他们通过为 tf.nn.softmax 节点命名成功地记录了 softmax 函数。 tf.nn.softmax(logits, name="softmax_tensor")
python - 推荐什么？ tensorflow train_and_evaluate 或 estimator.train, estimator.evaluate
我发现 tensorflow train_and_evaluate 的工作方式与传统的 tf.estimator train 和 evaluate 相比有点不同。train_and_evaluate
python - 模块 'tensorflow_estimator.python.estimator.api._v2.estimator' 没有属性 'inputs'
我正在使用 tensorflow 版本 2.0.0-beta1。打电话时 tf.estimator.inputs.pandas_input_fn 它给了我一个错误。 module 'tensorflo
python - Tensorflow，在另一个 tf.estimator model_fn 中使用 tf.estimator 训练模型
有没有办法在另一个模型 B 中使用经过 tf.estimator 训练的模型 A？这是情况，假设我有一个训练有素的“模型 A”和 model_a_fn()。“模型 A”获取图像作为输入，并输出一些类
tensorflow - Estimator 的 model_fn 包含 params 参数，但 params 不会传递给 Estimator
我正在尝试在本地运行对象检测 API。我相信我已经按照 TensorFlow Object Detection API 中的描述设置了所有内容。但是，当我尝试运行 model_main.py 时，会
python - gridSearch in loop **estimator should be an estimator implementing 'fit' method, 0 was passed** 错误
请原谅我的编码经验。我正在尝试使用 GridSearch 进行一系列回归。我正在尝试循环整个过程以使过程更快，但我的代码不够好并且不介意提高效率。这是我的代码: classifiers=[Lasso(
python - 使用 `tensorflow.python.keras.estimator.model_to_estimator` 将 Keras 模型转换为 Estimator API 时如何通知类权重？
我在将纯 Keras 模型转换为不平衡数据集上的 TensorFlow Estimator API 时遇到了一些麻烦。使用纯 Keras API 时，class_weight 参数在 model.f
python - 当使用 tf-tutorials 运行时，发生了 :AttributeError: module 'tensorflow.python.estimator.api.estimator' has no attribute 'SessionRunHook'
当发生上述错误时，我经常使用有关估计器的tensorflow官方教程，而它在google.colab中正常运行。我使用的环境是win10-64bit＆tensorflow-gpu==1.12.0＆p
estimation - 不花费大量时间进行估算的最佳方法是什么？
Closed. This question is opinion-based。它当前不接受答案。想要改善这个问题吗？更新问题，以便editing this post用事实和引用来回答。已关闭6年。
estimation - 您如何完善估算过程？
Closed. This question is opinion-based。它当前不接受答案。想要改善这个问题吗？更新问题，以便editing this post用事实和引用来回答。 1年前关闭。

首页

博学

6Ren·AI

商城