python - 二维数组的最小二乘回归-6ren

python - 二维数组的最小二乘回归

转载作者：太空宇宙更新时间：2023-11-04 01:04:36

numpy.linalg.lstsq(a,b) 函数接受大小为 nx2 的数组 a和一个作为因变量的一维数组 b。

如果数据点显示为从图像文件生成的二维数组，我将如何进行最小二乘回归？该数组看起来像这样:

[[0, 0, 0, 0, e]
 [0, 0, c, d, 0]
 [b, a, f, 0, 0]]

其中 a、b、c、d、e、f 是正整数值。

我想在这些点上画一条线。我可以使用 np.linalg.lstsq(如果可以，如何使用)或者是否有更有意义的东西(如果可以，如何使用)？

非常感谢。

最佳答案

有一次我看到一个类似的python程序来自

# Prac 2 for Monte Carlo methods in a nutshell
# Richard Chopping, ANU RSES and Geoscience Australia, October 2012
# Useage
# python prac_q2.py [number of bootstrap runs]
# e.g. python prac_q2.py 10000
# would execute this and perform 10 000 bootstrap runs.
# Default is 100 runs.

# sys cause I need to access the arguments the script was called with
import sys
# math cause it's handy for scalar maths
import math
# time cause I want to benchmark how long things take
import time
# numpy cause it gives us awesome array / matrix manipulation stuff
import numpy
# scipy just in case
import scipy
# scipy.stats to make life simpler statistcally speaking
import scipy.stats as stats

def main():
    print "Prac 2 solution: no graphs"
    true_model = numpy.array([17.0, 10.0, 1.96])

    # Here's a nifty way to write out numpy arrays.
    # Unlike the data table in the prac handouts, I've got time first
    # and height second.
    # You can mix up the order but you need to change a lot of calculations
    # to deal with this change.
    data = numpy.array([[1.0, 26.94],
                        [2.0, 33.45],
                        [3.0, 40.72],
                        [4.0, 42.32],
                        [5.0, 44.30],
                        [6.0, 47.19],
                        [7.0, 43.33],
                        [8.0, 40.13]])
    # Perform the least squares regression to find the best fit solution
    best_fit = regression(data)
    # Nifty way to get out elements from an array
    m1,m2,m3 = best_fit
    print "Best fit solution:"
    print "m1 is", m1, "and m2 is", m2, "and m3 is", m3

    # Calculate residuals from the best fit solution
    best_fit_resid = residuals(data, best_fit)

    print "The residuals from the best fit solution are:"
    print best_fit_resid
    print ""

    # Bootstrap part
    # --------------
    # Number of bootstraps to run. 100 is a minimum and our default number.
    num_booties = 100
    # If we have an argument to the python script, use this as the
    # number of bootstrap runs
    if len(sys.argv) > 1:
        num_booties = int(sys.argv[1])

    # preallocate an array to store the results.
    ensemble = numpy.zeros((num_booties, 3))

    print "Starting up the bootstrap routine"

    # How to do timing within a Python script - here I start a stopwatch running
    start_time = time.clock()
    for index in range(num_booties):
        # Print every 10 % so we know where we're up to in long runs
        if print_progress(index, num_booties):
            percent = (float(index) / float(num_booties)) * 100.0
            print "Have completed", percent, "percent"

        # For each iteration of the bootstrap algorithm,
        # first calculate mixed up residuals...
        resamp_resid = resamp_with_replace(best_fit_resid)
        # ... then generate new data...
        new_data = calc_new_data(data, best_fit, resamp_resid)
        # ... then perform another regression to generate a new set of m1, m2, m3 
        bootstrap_model = regression(new_data)
        ensemble[index] = (bootstrap_model[0], bootstrap_model[1], bootstrap_model[2])
        # Done with the loop
    # Calculate the time the run took - what's the current time, minus when we started.
    loop_time = time.clock() - start_time

    print ""

    print "Ensemble calculated based on", num_booties, "bootstrap runs."
    print "Bootstrap runs took", loop_time, "seconds."
    print ""

    # Stats on the ensemble time
    # --------------------------
    B = num_booties

    # Mean is pretty simple, 1.0/B to force it to use floating points
    # This gives us an array of the means of the 3 model parameters
    mean = 1.0/B * numpy.sum(ensemble, axis=0)
    print "Mean is ([m1 m2 m3]):", mean

    # Variance
    var2 = 1.0/B * numpy.sum(((ensemble - mean)**2), axis=0)
    print "Variance squared is ([m1 m2 m3]):", var2
    # Bias
    bias = mean - best_fit
    print "Bias is ([m1 m2 m3]):", bias
    bias_corr = best_fit - bias
    print "Bias corrected solution is ([m1 m2 m3]):", bias_corr
    print "The original solution was ([m1 m2 m3]):", best_fit
    print "And the true solution is ([m1 m2 m3]):", true_model

    print ""

    # Confidence intervals
    # ---------------------
    # Sort column 1 to calculate confidence intervals
    # Sorting in numpy sucks.
    # Need to declare what the fields are (so it knows how to sort it)
    #   f8 => numpy's floating point number
    # Then need to delcare what we sort it on
    # Here we sort on the first column, then the second, then the third.
    #   f0,f1,f2 field 0, then field 1, then field 2.
    # Then we make sure we sort it by column (axis = 0)
    # Then we take a view of that data as a float64 so it works properly
    sorted_m1 = numpy.sort(ensemble.view('f8,f8,f8'), order=['f0','f1','f2'], axis=0).view(numpy.float64)

    # stats is my name for scipy.stats
    # This has a wonderful function that calculates percentiles, including performing interpolation
    # (important for low numbers of bootstrap runs)
    m1_perc0p5 = stats.scoreatpercentile(sorted_m1,0.5)[0]
    m1_perc2p5 = stats.scoreatpercentile(sorted_m1,2.5)[0]
    m1_perc16 = stats.scoreatpercentile(sorted_m1,16)[0]
    m1_perc84 = stats.scoreatpercentile(sorted_m1,84)[0]
    m1_perc97p5 = stats.scoreatpercentile(sorted_m1,97.5)[0]
    m1_perc99p5 = stats.scoreatpercentile(sorted_m1,99.5)[0]
    print "m1 68% confidence interval is from", m1_perc16, "to", m1_perc84
    print "m1 95% confidence interval is from", m1_perc2p5, "to", m1_perc97p5
    print "m1 99% confidence interval is from", m1_perc0p5, "to", m1_perc99p5
    print ""

    # Now column 2, sort it...
    sorted_m2 = numpy.sort(ensemble.view('f8,f8,f8'), order=['f1','f0','f2'], axis=0).view(numpy.float64)
    # ... and do stats.
    m2_perc0p5 = stats.scoreatpercentile(sorted_m2,0.5)[1]
    m2_perc2p5 = stats.scoreatpercentile(sorted_m2,2.5)[1]
    m2_perc16 = stats.scoreatpercentile(sorted_m2,16)[1]
    m2_perc84 = stats.scoreatpercentile(sorted_m2,84)[1]
    m2_perc97p5 = stats.scoreatpercentile(sorted_m2,97.5)[1]
    m2_perc99p5 = stats.scoreatpercentile(sorted_m2,99.5)[1]
    print "m2 68% confidence interval is from", m2_perc16, "to", m2_perc84
    print "m2 95% confidence interval is from", m2_perc2p5, "to", m2_perc97p5
    print "m2 99% confidence interval is from", m2_perc0p5, "to", m2_perc99p5
    print ""

    # and finally column 3, again, sort it..
    sorted_m3 = numpy.sort(ensemble.view('f8,f8,f8'), order=['f2','f1','f0'], axis=0).view(numpy.float64)
    # ... and do stats.
    m3_perc0p5 = stats.scoreatpercentile(sorted_m3,0.5)[1]
    m3_perc2p5 = stats.scoreatpercentile(sorted_m3,2.5)[1]
    m3_perc16 = stats.scoreatpercentile(sorted_m3,16)[1]
    m3_perc84 = stats.scoreatpercentile(sorted_m3,84)[1]
    m3_perc97p5 = stats.scoreatpercentile(sorted_m3,97.5)[1]
    m3_perc99p5 = stats.scoreatpercentile(sorted_m3,99.5)[1]
    print "m3 68% confidence interval is from", m3_perc16, "to", m3_perc84
    print "m3 95% confidence interval is from", m3_perc2p5, "to", m3_perc97p5
    print "m3 99% confidence interval is from", m3_perc0p5, "to", m3_perc99p5
    print ""
    # End of the main function


#
#   
# Helper functions go down here
#   
#   


# regression
# This takes a 2D numpy array and performs a least-squares regression
# using the formula on the practical sheet, page 3
# Stored in the top are the real values
# Returns an array of m1, m2 and m3.
def regression(data):
    # While testing, just return the real values
    # real_values = numpy.array([17.0, 10.0, 1.96])

    # Creating the G matrix
    # ---------------------
    # Because I'm using numpy arrays here, we need
    # to learn some notation.
    # data[:,0] is the FIRST column
    # Length of this = number of time samples in data
    N = len(data[:,0])

    # numpy.sum adds up all data in a row or column.
    # Axis = 0 implies add up each column. [0] at end
    # returns the sum of the first column
    # This is the sum of Ti for i = 1..N
    sum_Ti = numpy.sum(data, axis=0)[0]

    # numpy.power takes each element of an array and raises them to a given power
    # In this one call we also take the sum of the columns (as above) after they have
    # been squared, and then just take the t column
    sum_Ti2 = numpy.sum(numpy.power(data, 2), axis=0)[0]

    # Now we need to get the cube of Ti, then sum that result
    sum_Ti3 = numpy.sum(numpy.power(data, 3), axis=0)[0]

    # Finally we need the quartic of Ti, then sum that result
    sum_Ti4 = numpy.sum(numpy.power(data, 4), axis=0)[0]

    # Now we can construct the G matrix
    G = numpy.array([[N, sum_Ti, -0.5 * sum_Ti2],
                        [sum_Ti, sum_Ti2, -0.5 * sum_Ti3],
                        [-0.5 * sum_Ti2, -0.5 * sum_Ti3, 0.25 * sum_Ti4]])
    # We also need to take the inverse of the G matrix
    G_inv = numpy.linalg.inv(G)


    # Creating the d matrix
    # ---------------------
    # Hello numpy.sum, my old friend...
    sum_Yi = numpy.sum(data, axis=0)[1]

    # numpy.prod multiplies the values in an array.
    # We need to do the products along axis 1 (i.e. row by row)
    # Then sum all the elements
    sum_TiYi = numpy.sum(numpy.prod(data, axis=1))

    # The final element we need is a bit tricky.
    # We need the product as above
    TiYi = numpy.prod(data, axis=1)
    # Then we get tricky. * works how we need it here,
    # remember that the Ti column is referenced by data[:,0] as above
    Ti2Yi = TiYi * data[:,0]
    # Then we sum
    sum_Ti2Yi = numpy.sum(Ti2Yi)

    #With all the elements, we make the d matrix
    d = numpy.array([sum_Yi,
                    sum_TiYi,
                    -0.5 * sum_Ti2Yi])

    # Do the linear algebra stuff
    # To multiple numpy arrays in a matrix style,
    # we need to use numpy.dot()
    # Not the most useful notation, but there you go.
    # To help out the Matlab users: http://www.scipy.org/NumPy_for_Matlab_Users
    result = G_inv.dot(d)

    #Return this result
    return result

# residuals:
# Takes in a data array, and an array of best fit paramers
# calculates the difference between the observed and predicted data
# and returns an array
def residuals(data, best_fit):
    # Extract ti from the data array
    ti = data[:,0]
    # We also need an array of the square of ti
    ti2 = numpy.power(ti, 2)

    # Extract yi
    yi = data[:,1]

    # Calculate residual (data minus predicted)
    result = yi - best_fit[0] - (best_fit[1] * ti) + (0.5 * best_fit[2] * ti2)

    return result

# resamp_with_replace:
# Perform a dataset resampling with replacement on parameter set.
# Uses numpy.random to generate the random numbers to pick the indices to look up.
# So for item 0, ... N, we look up a random index from the set and put that in
# our resampled data.
def resamp_with_replace(set):
    # How many things do we need to do this for?
    N = len(set)

    # Preallocate our result array
    result = numpy.zeros(N)

    # Generate N random integers between 0 and N-1
    indices = numpy.random.randint(0, N - 1, N)

    # For i from the set 0...N-1 (that's what the range() command gives us),
    # our result for that i is given by the index we randomly generated above
    for i in range(N):
        result[i] = set[indices[i]]

    return result

# calc_new_data:
# Given a set of resampled residuals, use the model parameters to derive
# new data. This is used for bootstrapping the residuals.
# true_data is a numpy array of rows of ti, yi. We only need the ti column though.
# model is an array of three parameters, corresponding to m1, m2, m3.
# residuals are an array of our resudials
def calc_new_data(true_data, model, residuals):
    # Extract the time information from the new data array
    ti = true_data[:,0]

    # Calculate new data using array maths
    # This goes through and does the sums etc for each element of the array
    # Nice and compact way to represent it.
    y_new = residuals + model[0] + (model[1] * ti) - (0.5 * model[2] * ti**2)

    # Our result needs to be an array of ti, y_new, so we need to combine them using
    # the numpy.column_stack routine
    result = numpy.column_stack((ti, y_new))

    # Return this combined array
    return result

# print_progress:
# Just a quick thing that returns true if we want to print for this index
# and false otherwise
def print_progress(index, total):
    index = float(index)
    total = float(total)

    result = False

    # Floating point maths is irritating
    # We want to print at the start, every 10%, and at the end.
    # This works up to index = 100,000
    # Would also be lovely if Python had a switch statement
    if (((index / total) * 100) <= 0.00001):
        result = True
    elif (((index / total) * 100) >= 9.99999) and (((index / total) * 100) <= 10.00001):
        result = True
    elif (((index / total) * 100) >= 19.99999) and (((index / total) * 100) <= 20.00001):
        result = True
    elif (((index / total) * 100) >= 29.99999) and (((index / total) * 100) <= 30.00001):
        result = True
    elif (((index / total) * 100) >= 39.99999) and (((index / total) * 100) <= 40.00001):
        result = True
    elif (((index / total) * 100) >= 49.99999) and (((index / total) * 100) <= 50.00001):
        result = True
    elif (((index / total) * 100) >= 59.99999) and (((index / total) * 100) <= 60.00001):
        result = True
    elif (((index / total) * 100) >= 69.99999) and (((index / total) * 100) <= 70.00001):
        result = True
    elif (((index / total) * 100) >= 79.99999) and (((index / total) * 100) <= 80.00001):
        result = True
    elif (((index / total) * 100) >= 89.99999) and (((index / total) * 100) <= 90.00001):
        result = True
    elif ((((index+1) / total) * 100) > 99.99999):
        result = True
    else:
        result = False

    return result

#
#   
# End of helper functions
#
#

# So we can easily execute our script
if __name__ == "__main__":
    main()

我想你可以看看，这里是link完成信息

关于python - 二维数组的最小二乘回归，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31123308/

文章推荐： python - 自定义排序 Python 列表总是在开头的某些字符串？

文章推荐： c - 为什么在 C 中使用函数声明？

文章推荐： c - 结构数组初始化

文章推荐： Python 3 如何将以字节表示的大数转换为整数？

c++ int 数组，值为 2 维 int 数组(3d 数组)
我正在尝试创建一个包含 int[][] 项的数组即 int version0Indexes[][4] = { {1,2,3,4}, {5,6,7,8} }; int version1Indexes[
Java 数组[i]++ 与++数组[i]
我有一个整数数组: private int array[]; 如果我还有一个名为 add 的方法，那么以下有什么区别: public void add(int value) { array[va
JavaScript 数组 + 数组 = 字符串？
当您尝试在 JavaScript 中将一个数组添加到另一个数组时，它会将其转换为一个字符串。通常，当以另一种语言执行此操作时，列表会合并。 JavaScript [1, 2] + [3, 4] = "
数组
根据我正在阅读的教程，如果您想创建一个包含 5 列和 3 行的表格来表示这样的数据... 45 4 34 99 56 3 23 99 43 2 1 1 0 43 67 ...它说你可以使用下
数组
我通常使用 python 编写脚本/程序，但最近开始使用 JavaScript 进行编程，并且在使用数组时遇到了一些问题。在 python 中，当我创建一个数组并使用 for x in y 时，我得
数组 toString() 中的 javascript 数组
我有一个这样的数组: temp = [ 'data1', ['data1_a','data1_b'], ['data2_a','data2_b','data2_c'] ]; // 我想使用 toStr
php - 如何将秒表结果(数组)推送到第一个表结果(数组)
rent_property (table name) id fullName propertyName 1 A House Name1 2 B
C++ 数组 [索引] 与索引 [数组]
这个问题在这里已经有了答案: 关闭13年前。 Possible Duplicate: In C arrays why is this true? a[5] == 5[a] array[index] 和
excel - 将用户名(数组)与电子邮件(数组)匹配
使用 Excel 2013。经过多年的寻找和适应，我的第一篇文章。我正在尝试将当前 App 用户(即“John Smith”)与他的电子邮件地址“jsmith@work.com”进行匹配。使用两个
r - 3D 数组 -> 应用 -> 3D 数组
当仅在一个边距上操作时，apply 似乎不会重新组装 3D 数组。考虑: arr 1)，但对我来说仍然很奇怪，如果一个函数返回一个具有尺寸的对象，那么它们基本上会被忽略。最佳答案这是一个不太理
javascript - php 数组(数组)到 javascript
我有一个包含 GPS 坐标的 MySQL 数据库。这是我检索坐标的部分 PHP 代码； $sql = "SELECT lat, lon FROM gps_data"; $stmt=$db->query
python - 查找最后一个非零元素 3D 数组 - numpy 数组
我需要找到一种方法来执行这个操作，我有一个形状数组 [批量大小, 150, 1] 代表 batch_size 整数序列，每个序列有 150 个元素长，但在每个序列中都有很多添加的零，以使所有序列具有相
android - 如何在json中访问对象>数组>对象>数组>对象？
我必须通过 url 中的 json 获取文本。层次结构如下: 对象>数组>对象>数组>对象。我想用这段代码获取文本。但是我收到错误 :org.json.JSONException: No valu
cocoa - NSMutable NSArray 数组 - 如何避免所有这些行并使用维度或 3D 数组？
enter code here- (void)viewDidLoad { NSMutableArray *imageViewArray= [[NSMutableArray alloc] init];
java - 流式传输 2d 数组、修剪值并收集回 2d 数组
知道如何对二维字符串数组执行修剪操作，例如使用 Java 流 API 进行 3x3 并将其收集回相同维度的 3x3 数组？重点是避免使用显式的 for 循环。当前的解决方案只是简单地执行一个 fo
使用嵌套循环的 Java Union 数组 2 int 数组
已关闭。此问题需要 debugging details 。目前不接受答案。编辑问题以包含 desired behavior, a specific problem or error, and the
Jquery 与 JSON 数组 - 转换为 Javascript 数组
我有来自 ASP.NET Web 服务的以下 XML 输出: 1710 1711 1712 1713
javascript - 更新嵌套数组和对象中的对象。对象-->数组-->对象-->数组--> "object"
如果我有一个对象todo作为您状态的一部分，并且该对象包含数组列表，则列表内部有对象，在这些对象内部还有另一个数组listItems。如何更新数组 listItems 中 id 为“poi098”的对
c# - 如何在一个字节中转换 bool 数组，然后再转换回 bool 数组
我想将最大长度为 8 的 bool 数组打包成一个字节，通过网络发送它，然后将其解压回 bool 数组。已经在这里尝试了一些解决方案，但没有用。我正在使用单声道。我制作了 BitArray，然后尝试
c# - 将 char 数组/字符串转换为 bool 数组
我们的数据库中有这个字段指示一周中的每一天的真/假标志，如下所示:'1111110' 我需要将此值转换为 boolean 数组。为此，我编写了以下代码: char[] freqs = weekday

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 二维数组的最小二乘回归