r - 快速分组简单线性回归-6ren

r - 快速分组简单线性回归

转载作者：行者123 更新时间：2023-12-01 18:13:22

此问答来自How to make group_by and lm fast?其中 OP 试图对大型数据框的每组进行简单的线性回归。

理论上，一系列 group-by 回归 y ~ x | g等效于单个池回归 y ~ x * g .后者非常吸引人，因为不同组之间的统计测试很简单。但在实践中，做这个更大的回归在计算上并不容易。我对链接的问答评论包的回答 speedlm和 glm4 ，但指出他们不能很好地解决这个问题。

大回归问题很困难，特别是当存在因子变量时。这或许可以解释为什么很多人放弃了这个想法，而更喜欢按组拆分数据并按组拟合模型。我没有必要列举分组回归的方法(参见 Linear Regression and group by in R)。我关心的是速度。

对于简单的线性回归，y ~ x | g ，按组拆分数据，然后依赖标准模型拟合例程，如 lm是性能杀手。首先，对大型数据框进行子集化是低效的。其次，标准模型拟合例程遵循以下过程，这对于有用的回归计算来说是纯粹的开销。

将模型公式解析为“术语”对象(使用 terms.formula )；

构建模型框架(使用 model.frame.default )；

构建模型矩阵(使用 model.matrix.default )。

简单的线性回归有巧妙的计算技巧。正如我在 Fast pairwise simple linear regression between variables in a data frame 中演示的那样，协方差方法非常快。我们能否通过 group_by_simpleLM 将其调整为分组简单线性回归？功能？

最佳答案

我们必须通过编写编译代码来做到这一点。我会用 Rcpp 来介绍这个。请注意，我是一名 C 程序员，并且一直在使用 R 的传统 C 接口(interface)。 Rcpp 仅用于简化列表、字符串和属性的处理，以及便于在 R 中进行即时测试。代码大部分是用 C 风格编写的。来自 R 的传统 C 接口(interface)的宏，如 REAL和 INTEGER仍在使用。请参阅此答案的底部以获取“group_by_simpleLM.cpp”。

R 包装函数 group_by_simpleLM有四个参数:

group_by_simpleLM <- function (dat, LHS, RHS, group) {
##.... [TRUNCATED]

dat是一个数据框。如果你输入一个矩阵或一个列表，它会停下来提示。

LHS是一个字符向量，为 ~ 左侧的变量提供名称.支持多个 LHS 变量。

RHS是一个字符向量，为 ~ 右侧的变量提供名称.在简单线性回归中只允许使用单个非因子 RHS 变量。您可以向 RHS 提供变量向量，但该函数只会保留第一个元素(带有警告)。如果在 dat 中找不到该变量(可能是因为你打错了名字)或者它不是一个数字变量，它会给你一个信息性的错误消息。

group是一个字符向量，给出分组变量的名称。它最好是 dat 中的一个因素, 否则函数使用 match(group, unique(group))快速强制并发出警告。具有未使用水平的因素没有害处。 group_by_simpleLM_cpp看到这个并全部返回NaN对于这样的水平。 group可以是NULL以便对所有数据进行一次回归。

主力功能 group_by_simpleLM_cpp返回一个命名的矩阵列表来保存每个响应的回归结果。每个矩阵都是“宽”的 nlevels(group)列和 5 行:

“阿尔法”(拦截)；

“beta”(用于斜率)；

“beta.se”(用于斜率的标准误差)；

“r2”(用于 R 平方)；

“df.resid”(剩余自由度)；

For a simple linear regression, these five statistics are sufficient to obtain other statistics .

该函数注意组中只有一个数据的秩不足的情况。坡度无法估计， NaN被退回。另一种特殊情况是当一个组只有两个数据时。然后拟合完美，斜率的标准误差为 0。

该函数是 nlme::lmList(RHS ~ LHS | group, dat, pool = FALSE) 的快速方法当 group != NULL ，以及 lm(RHS ~ LHS, dat) 的快速方法当 group = NULL (甚至可能比 general_paired_simpleLM 更快，因为它是用 C 编写的)。

注意:

不处理加权回归，因为在这种情况下协方差方法无效。

没有检查 NA/NaN/Inf/-Inf在 dat被制造出来，功能在它们的存在下被打破。

例子

library(Rcpp)
sourceCpp("group_by_simpleLM.cpp")

## a toy dataset
set.seed(0)
dat <- data.frame(y1 = rnorm(10), y2 = rnorm(10), x = 1:5,
                  f = gl(2, 5, labels = letters[1:2]),
                  g = sample(gl(2, 5, labels = LETTERS[1:2])))

分组回归:nlme::lmList 的快速方法

group_by_simpleLM(dat, c("y1", "y2"), "x", "f")
#$y1
#                    a          b
#alpha     0.820107094 -2.7164723
#beta     -0.009796302  0.8812007
#beta.se   0.266690568  0.2090644
#r2        0.000449565  0.8555330
#df.resid  3.000000000  3.0000000
#
#$y2
#                  a           b
#alpha     0.1304709  0.06996587
#beta     -0.1616069 -0.14685953
#beta.se   0.2465047  0.24815024
#r2        0.1253142  0.10454374
#df.resid  3.0000000  3.00000000

fit <- nlme::lmList(cbind(y1, y2) ~ x | f, data = dat, pool = FALSE)

## results for level "a"; use `fit[[2]]` to see results for level "b"
lapply(summary(fit[[1]]), "[", c("coefficients", "r.squared"))
#$`Response y1`
#$`Response y1`$coefficients
#                Estimate Std. Error     t value  Pr(>|t|)
#(Intercept)  0.820107094  0.8845125  0.92718537 0.4222195
#x           -0.009796302  0.2666906 -0.03673284 0.9730056
#
#$`Response y1`$r.squared
#[1] 0.000449565
#
#
#$`Response y2`
#$`Response y2`$coefficients
#              Estimate Std. Error    t value  Pr(>|t|)
#(Intercept)  0.1304709  0.8175638  0.1595850 0.8833471
#x           -0.1616069  0.2465047 -0.6555936 0.5588755
#
#$`Response y2`$r.squared
#[1] 0.1253142

无故障处理秩不足

## with unused level "b"
group_by_simpleLM(dat[1:5, ], "y1", "x", "f")
#$y1
#                    a   b
#alpha     0.820107094 NaN
#beta     -0.009796302 NaN
#beta.se   0.266690568 NaN
#r2        0.000449565 NaN
#df.resid  3.000000000 NaN

## rank-deficient case for level "b"
group_by_simpleLM(dat[1:6, ], "y1", "x", "f")
#$y1
#                    a        b
#alpha     0.820107094 -1.53995
#beta     -0.009796302      NaN
#beta.se   0.266690568      NaN
#r2        0.000449565      NaN
#df.resid  3.000000000  0.00000

多个分组变量

当我们有多个分组变量时， group_by_simpleLM不能直接处理。但是你可以使用 interaction首先创建一个单因素变量。

dat$fg <- with(dat, interaction(f, g, drop = TRUE, sep = ":"))
group_by_simpleLM(dat, c("y1", "y2"), "x", "fg")
#$y1
#                a:A        b:A        a:B        b:B
#alpha     1.4750325 -2.7684583 -1.6393289 -1.8513669
#beta     -0.2120782  0.9861509  0.7993313  0.4613999
#beta.se   0.0000000  0.2098876  0.4946167  0.0000000
#r2        1.0000000  0.9566642  0.7231188  1.0000000
#df.resid  0.0000000  1.0000000  1.0000000  0.0000000
#
#$y2
#                a:A         b:A        a:B        b:B
#alpha     1.0292956 -0.22746944 -1.5096975 0.06876360
#beta     -0.2657021 -0.20650690  0.2547738 0.09172993
#beta.se   0.0000000  0.01945569  0.3483856 0.00000000
#r2        1.0000000  0.99120195  0.3484482 1.00000000
#df.resid  0.0000000  1.00000000  1.0000000 0.00000000

fit <- nlme::lmList(cbind(y1, y2) ~ x | fg, data = dat, pool = FALSE)

## note that the first group a:A only has two values, so df.resid = 0
## my method returns 0 standard error for the slope
## but lm or lmList would return NaN
lapply(summary(fit[[1]]), "[", c("coefficients", "r.squared"))
#$`Response y1`
#$`Response y1`$coefficients
#              Estimate Std. Error t value Pr(>|t|)
#(Intercept)  1.4750325        NaN     NaN      NaN
#x           -0.2120782        NaN     NaN      NaN
#
#$`Response y1`$r.squared
#[1] 1
#
#
#$`Response y2`
#$`Response y2`$coefficients
#              Estimate Std. Error t value Pr(>|t|)
#(Intercept)  1.0292956        NaN     NaN      NaN
#x           -0.2657021        NaN     NaN      NaN
#
#$`Response y2`$r.squared
#[1] 1

不分组:lm 的快速方法

group_by_simpleLM(dat, c("y1", "y2"), "x", NULL)
#$y1
#                ALL
#alpha    -0.9481826
#beta      0.4357022
#beta.se   0.2408162
#r2        0.2903691
#df.resid  8.0000000
#
#$y2
#                ALL
#alpha     0.1002184
#beta     -0.1542332
#beta.se   0.1514935
#r2        0.1147012
#df.resid  8.0000000

快速大简单线性回归

set.seed(0L)
nSubj <- 200e3
nr <- 1e6
DF <- data.frame(subject = gl(nSubj, 5),
                 day = 3:7,
                 y1 = runif(nr), 
                 y2 = rpois(nr, 3), 
                 y3 = rnorm(nr), 
                 y4 = rnorm(nr, 1, 5))

system.time(group_by_simpleLM(DF, paste0("y", 1:4), "day", "subject"))
#   user  system elapsed 
#  0.200   0.016   0.219 

library(MatrixModels)
system.time(glm4(y1 ~ 0 + subject + day:subject, data = DF, sparse = TRUE))
#   user  system elapsed 
#  9.012   0.172   9.266

group_by_simpleLM几乎立即完成所有 4 个响应，而 glm4仅一项响应就需要 9 秒!

请注意 glm4在等级不足的情况下可能会崩溃，而 group_by_simpleLM不会。

附录:“group_by_simpleLM.cpp”

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List group_by_simpleLM_cpp (List Y, NumericVector x, IntegerVector group, CharacterVector group_levels, bool group_unsorted) {

  /* number of data and number of responses */
  int n = x.size(), k = Y.size(), n_groups = group_levels.size();

  /* set up result list */
  List result(k);
  List dimnames = List::create(CharacterVector::create("alpha", "beta", "beta.se", "r2", "df.resid"), group_levels);
  int j; for (j = 0; j < k; j++) {
    NumericMatrix mat(5, n_groups);
    mat.attr("dimnames") = dimnames;
    result[j] = mat;
    }
  result.attr("names") = Y.attr("names");

  /* set up a vector to hold sample size for each group */
  size_t *group_offset = (size_t *)calloc(n_groups + 1, sizeof(size_t));

  /*
    compute group offset: cumsum(group_offset)
    The offset is used in a different way when group is sorted or unsorted
    In the former case, it is the offset to real x, y values;
    In the latter case, it is the offset to ordering index indx
 */
  int *u = INTEGER(group), *u_end = u + n, i;
  if (n_groups > 1) {
    while (u < u_end) group_offset[*u++]++;
    for (i = 0; i < n_groups; i++) group_offset[i + 1] += group_offset[i];
    } else {
    group_offset[1] = n;
    group_unsorted = 0;
    }

  /* local variables & pointers */
  double *xi, *xi_end;    /* pointer to the 1st and the last x value */
  double *yi;             /* pointer to the first y value */
  int gi; double inv_gi;  /* sample size of the i-th group */
  double xi_mean, xi_var; /* mean & variance of x values in the i-th group */
  double yi_mean, yi_var; /* mean & variance of y values in the i-th group */
  double xiyi_cov;        /* covariance between x and y values in the i-th group */
  double beta, r2; int df_resi;
  double *matij;

  /* additional storage and variables when group is unsorted */
  int *indx; double *xb, *xbi, dtmp;
  if (group_unsorted) {
    indx = (int *)malloc(n * sizeof(int));
    xb = (double *)malloc(n * sizeof(double));  // buffer x for caching
    R_orderVector1(indx, n, group, TRUE, FALSE);  // Er, how is TRUE & FALSE recogonized as Rboolean?
    }

  /* loop through groups */
  for (i = 0; i < n_groups; i++) {
    /* set group size gi */
    gi = group_offset[i + 1] - group_offset[i];
    /* special case for a factor level with no data */
    if (gi == 0) {
      for (j = 0; j < k; j++) {
        /* matrix column for write-back */
        matij = REAL(result[j]) + i * 5;
        matij[0] = R_NaN; matij[1] = R_NaN; matij[2] = R_NaN;
        matij[3] = R_NaN; matij[4] = R_NaN;
        }
      continue;
      }
    /* rank-deficient case */
    if (gi == 1) {
      gi = group_offset[i];
      if (group_unsorted) gi = indx[gi];
      for (j = 0; j < k; j++) {
        /* matrix column for write-back */
        matij = REAL(result[j]) + i * 5;
        matij[0] = REAL(Y[j])[gi];
        matij[1] = R_NaN; matij[2] = R_NaN;
        matij[3] = R_NaN; matij[4] = 0.0;
        }
      continue;
      }
    /* general case where a regression line can be estimated */
    inv_gi = 1 / (double)gi;
    /* compute mean & variance of x values in this group */
    xi_mean = 0.0; xi_var = 0.0;
    if (group_unsorted) {
      /* use u, u_end and xbi */
      xi = REAL(x);
      u = indx + group_offset[i];  /* offset acts on index */
      u_end = u + gi;
      xbi = xb + group_offset[i];
      for (; u < u_end; xbi++, u++) {
        dtmp = xi[*u];
        xi_mean += dtmp;
        xi_var += dtmp * dtmp;
        *xbi = dtmp;
        }
      } else {
      /* use xi and xi_end */
      xi = REAL(x) + group_offset[i];  /* offset acts on values */
      xi_end = xi + gi;
      for (; xi < xi_end; xi++) {
        xi_mean += *xi;
        xi_var += (*xi) * (*xi);
        }
      }
    xi_mean = xi_mean * inv_gi;
    xi_var = xi_var * inv_gi - xi_mean * xi_mean;
    /* loop through responses doing simple linear regression */
    for (j = 0; j < k; j++) {
      /* compute mean & variance of y values, as well its covariance with x values */
      yi_mean = 0.0; yi_var = 0.0; xiyi_cov = 0.0;
      if (group_unsorted) {
        xbi = xb + group_offset[i];  /* use buffered x values */
        yi = REAL(Y[j]);
        u = indx + group_offset[i];  /* offset acts on index */
        for (; u < u_end; u++, xbi++) {
          dtmp = yi[*u];
          yi_mean += dtmp;
          yi_var += dtmp * dtmp;
          xiyi_cov += dtmp * (*xbi);
          } 
        } else {
        /* set xi and yi */
        xi = REAL(x) + group_offset[i];  /* offset acts on values */
        yi = REAL(Y[j]) + group_offset[i];  /* offset acts on values */
        for (; xi < xi_end; xi++, yi++) {
          yi_mean += *yi;
          yi_var += (*yi) * (*yi);
          xiyi_cov += (*yi) * (*xi);
          }
        }
      yi_mean = yi_mean * inv_gi;
      yi_var = yi_var * inv_gi - yi_mean * yi_mean;
      xiyi_cov = xiyi_cov * inv_gi - xi_mean * yi_mean;
      /* locate the right place to write back regression result */
      matij = REAL(result[j]) + i * 5 + 4;
      /* residual degree of freedom */
      df_resi = gi - 2; *matij-- = (double)df_resi;
      /* R-squared = squared correlation */
      r2 = (xiyi_cov * xiyi_cov) / (xi_var * yi_var); *matij-- = r2;
      /* standard error of regression slope */
      if (df_resi == 0) *matij-- = 0.0;
      else *matij-- = sqrt((1 - r2) * yi_var / (df_resi * xi_var));
      /* regression slope */
      beta = xiyi_cov / xi_var; *matij-- = beta;
      /* regression intercept */
      *matij = yi_mean - beta * xi_mean;
      }
    }

  if (group_unsorted) {
    free(indx);
    free(xb);
    }
  free(group_offset);
  return result;
  }

/*** R
group_by_simpleLM <- function (dat, LHS, RHS, group = NULL) {

  ## basic input validation
  if (missing(dat)) stop("no data provided to 'dat'!")
  if (!is.data.frame(dat)) stop("'dat' must be a data frame!")

  if (missing(LHS)) stop("no 'LHS' provided!")
  if (!is.character(LHS)) stop("'LHS' must be provided as a character vector of variable names!")

  if (missing(RHS)) stop("no 'RHS' provided!")
  if (!is.character(RHS)) stop("'RHS' must be provided as a character vector of variable names!")

  if (!is.null(group)) {

    ## grouping variable provided: a fast method of `nlme::lmList`

    if (!is.character(group)) stop("'group' must be provided as a character vector of variable names!")

    ## ensure that group has length 1, is available in the data frame and is a factor
    if (length(group) > 1L) {
      warning("only one grouping variable allowed for group-by simple linear regression; ignoring all but the 1st variable provided!")
      group <- group[1L]
      }
    grp <- dat[[group]]
    if (is.null(grp)) stop(sprintf("grouping variable '%s' not found in 'dat'!", group))

    if (is.factor(grp)) {
      grp_levels <- levels(grp)
      } else {
      warning("grouping variable is not provided as a factor; fast coercion is made!")
      grp_levels <- unique(grp)
      grp <- match(grp, grp_levels)
      grp_levels <- as.character(grp_levels)
      }

    grp_unsorted <- .Internal(is.unsorted(grp, FALSE))

    } else {

    ## no grouping; a fast method of `lm`
    grp <- 1L; grp_levels <- "ALL"; grp_unsorted <- FALSE

    }

  ## the RHS must has length 1, is available in the data frame and is numeric
  if (length(RHS) > 1L) {
    warning("only one RHS variable allowed for simple linear regression; ignoring all but the 1st variable provided!")
    RHS <- RHS[1L]
    }
  x <- dat[[RHS]]
  if (is.null(x)) stop(sprintf("RHS variable '%s' not found in 'dat'!", RHS))
  if (!is.numeric(x) || is.factor(x)) {
    stop("RHS variable must be 'numeric' for simple linear regression!")
    }
  x < as.numeric(x)  ## just in case that `x` is an integer

  ## check LHS variables
  nested <- match(RHS, LHS, nomatch = 0L)
  if (nested > 0L) {
    warning(sprintf("RHS variable '%s' found in LHS variables; removing it from LHS", RHS))
    LHS <- LHS[-nested]
    }
  if (length(LHS) == 0L) stop("no usable LHS variables found!")
  missed <- !(LHS %in% names(dat))
  if (any(missed)) {
    warning(sprintf("LHS variables '%s' not found in 'dat'; removing them from LHS", toString(LHS[missed])))
    LHS <- LHS[!missed]
    }
  if (length(LHS) == 0L) stop("no usable LHS variables found!")
  Y <- dat[LHS]
  invalid_LHS <- vapply(Y, is.factor, FALSE) | (!vapply(Y, is.numeric, FALSE))
  if (any(invalid_LHS)) {
    warning(sprintf("LHS variables '%s' are non-numeric or factors; removing them from LHS", toString(LHS[invalid_LHS])))
    Y <- Y[!invalid_LHS]
    }
  if (length(Y) == 0L) stop("no usable LHS variables found!")
  Y <- lapply(Y, as.numeric)  ## just in case that we have integer variables in Y

  ## check for exsitence of NA, NaN, Inf and -Inf and drop them?

  ## use Rcpp
  group_by_simpleLM_cpp(Y, x, grp, grp_levels, grp_unsorted)
  }
*/

关于r - 快速分组简单线性回归，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51996021/

文章推荐： java - System.in多行输入，StringIndexOutOfBoundsException？

文章推荐： java - 无法找到输出为 128 的原因和方式

Java正则表达式，简单
我正在努力实现以下目标，假设我有字符串: ( z ) ( A ( z ) ( A ( z ) ( A ( z ) ( A ( z ) ( A ) ) ) ) ) 我想编写一个正则
CSS水平滚动(简单)
给定: 1 2 3 4 5 6
MySQL填充样例数据(简单)
很难说出这里要问什么。这个问题模棱两可、含糊不清、不完整、过于宽泛或夸夸其谈，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开，visit the help center . 关闭 1
简单、好懂的Svelte实现原理
大家好，我卡颂。 Svelte问世很久了，一直想写一篇好懂的原理分析文章，拖了这么久终于写了。本文会围绕一张流程图和两个Demo讲解，正确的食用方式是用电脑打开本文，跟着流程图、Demo一
Javascript使用正则验证身份证号(简单)
身份证为15位或者18位，15位的全为数字，18位的前17位为数字，最后一位为数字或者大写字母”X“。与之匹配的正则表达式： ?
强大的jquery插件jqeuryUI做网页对话框效果！简单
我们先来最简单的，网页的登录窗口；不过开始之前，大家先下载jquery的插件本人习惯用了vs2008来做网页了，先添加一个空白页这是最简单的的做法。。。先在body里面插入 <
简单、易用的MySQL官方压力测试工具
1、MySQL自带的压力测试工具 Mysqlslap mysqlslap是mysql自带的基准测试工具,该工具查询数据,语法简单,灵活容易使用.该工具可以模拟多个客户端同时并发的向服务器发出
.NET开源、简单、实用的数据库文档生成工具
前言今天大姚给大家分享一款.NET开源（MIT License）、免费、简单、实用的数据库文档（字典）生成工具，该工具支持CHM、Word、Excel、PDF、Html、XML、Markdown等
【Go基础入门教程】Go语言代码风格清晰、简单
Go语言语法类似于C语言，因此熟悉C语言及其派生语言（ C++、 C#、Objective-C 等）的人都会迅速熟悉这门语言。 C语言的有些语法会让代码可读性降低甚至发生歧义。Go语言在C语言的
FFMpeg 简单/快速转换视频文件不适用于任何视频
我正在使用快速将 mkv 转换为 mp4 ffmpeg 命令 ffmpeg -i test.mkv -vcodec copy -acodec copy new.mp4 但不适用于任何 mkv 文件，当
VBA 计算具有特定名称的工作表数(简单)
我想计算我的工作簿中的工作表数量，然后从总数中减去特定的工作表。我错过了什么？这给了我一个对象错误: wsCount = ThisWorkbook.Sheets.Count - ThisWorkboo
Perl 配置::简单
我有一个 perl 文件，用于查看文件夹中是否存在 ini。如果是，它会从中读取，如果不是，它会根据我为它制作的模板创建一个。我在 ini 部分使用 Config::Simple。我的问题是，如果
ios - 如何访问iOS通知中传递的数据(简单)？
尝试让一个 ViewController 通过标准 Cocoa 通知与另一个 ViewController 进行通信。编写了一个简单的测试用例。在我最初的 VC 中，我将以下内容添加到 viewDi
optimization - (简单？)折线图的标签放置
我正在绘制高程剖面图，显示沿路径的高程增益/损失，类似于下面的: Sample Elevation Profile with hand-placed labels http://img38.image
javascript - 使用JS隐藏div(简单)
嗨，所以我需要做的是最终让 regStart 和 regPage 根据点击事件交替可见性，我不太担心编写 JavaScript 函数，但我根本无法让我的 regPage 首先隐藏。这是我的代码。请简单
c++ - 简单 for 循环中的大量时间损失
我有一个非常简单的程序来测量一个函数花费了多少时间。 #include #include #include struct Foo { void addSample(uint64_t s)
JavaScript 简单 BitConverter
我需要为 JavaScript 制作简单的 C# BitConverter。我做了一个简单的BitConverter class BitConverter{ constructor(){} GetBy
javascript - 简单 for 循环出现意外标记错误？
已关闭。这个问题是 not reproducible or was caused by typos 。目前不接受答案。这个问题是由拼写错误或无法再重现的问题引起的。虽然类似的问题可能是 on-top
group-by - 简单.数据分组依据
我是 Simple.Data 的新手。但我很难找到如何进行“分组依据”。我想要的是非常基本的。表格看起来像: +________+ | cards | +________+ | id |
Javascript 简单 UDF
我现在正在开发一个 JS UDF，它看起来遵循编码。通常情况下，由于循环计数为 2，Alert Msg 会出现两次。我想要的是即使循环计数为 3，Alert Msg 也只会出现一次。任何想法都

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - 快速分组简单线性回归