python - 有没有一种方法可以将多个逻辑回归方程整合为一个？-6ren

python - 有没有一种方法可以将多个逻辑回归方程整合为一个？

转载作者：行者123 更新时间：2023-12-01 01:16:06

我正在研究一个二元分类问题，其响应率(坏)小于 1%。预测变量包括一组名义分类变量和连续变量。

最初，我尝试使用过采样技术 (SMOTE) 来平衡这两个类。对过采样数据集执行逻辑回归可以获得良好的整体准确性，但误报率非常高。

我现在计划进行欠采样并运行多个逻辑回归模型。我正在处理的基本Python代码如下。需要指导将这些多个逻辑回归模型的结果整合为一个。

import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
#Set i for the number of equations required
for i in range(10):
   #Create a sample of goods, good is pandas df containing goods
   sample_good=good.sample(n=300,replace=True)

   #Create a sample of bads, bad is pandas df containing bads. There are 
   #only 100 bads in the dataset
   sample_bad=bad.sample(n=100,replace=True)

   #Append the good and bad sample
   sample=sample_good.append(sample_bad)

   X = sample.loc[:, sample.columns != 'y']
   y = sample.loc[:, sample.columns == 'y']

   from sklearn.linear_model import LogisticRegression
   from sklearn import metrics
   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, 
   random_state=0)
   logreg = LogisticRegression()
   logreg.fit(X_train, y_train)
   y_pred = logreg.predict(X_test)
   print('Accuracy of logistic regression classifier on test set: 
   {:.2f}'.format(logreg.score(X_test, y_test)))

上面的 for 循环运行 10 次并构建 10 个不同的模型。需要有关将这 10 个模型集成到一个模型中的指导。我已经阅读了有关装袋等可用技术的信息。在这种情况下，由于响应率非常低，我创建的示例每次都需要包含所有坏项。

最佳答案

我认为你应该使用 scikit-learn 的 BaggingClassifier。简而言之，它在数据的随机子样本上拟合多个分类器，然后让它们投票来执行分类。这个元估计器将优雅地阻止您编写 for 循环。至于采样(我相信这是您编写循环的原始动机)，您可以在 model.fit() 方法中调整权重。

import numpy as np
from sklearn.ensemble import BaggingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import balanced_accuracy_score

breast_cancer = datasets.load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target
X_train, X_test, y_train, y_test = train_test_split(X,y)

正如您所看到的，数据集是不平衡的(毕竟，它是医疗数据):

len(y_train[y_train == 0]),len(y_train[y_train == 1]) # 163, 263

因此，让我们添加样本权重

y0 = len(y_train[y_train == 0])
y1 = len(y_train[y_train == 1])

w0 = y1/y0
w1 = 1

sample_weights = np.zeros(len(y_train))
sample_weights[y_train == 0] = w0
sample_weights[y_train == 1] = w1

现在是 BaggingClassifier:

model = BaggingClassifier(LogisticRegression(solver = 'liblinear'), 
                      n_estimators=10, 
                      bootstrap = True, random_state = 2019)
model.fit(X,y,sample_weights)
balanced_accuracy_score(y_test,model.predict(X_test)) # 94.2%

请注意，如果我不适合样本权重，我只能得到 92.1% 的平衡准确度(平衡准确度 = 平均召回率，这对于不平衡问题非常方便)

关于python - 有没有一种方法可以将多个逻辑回归方程整合为一个？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54335684/

文章推荐： javascript - 如何将 php var 传递到 javascript

文章推荐： python - 使用 ManyToManyField 时出现重复字段

文章推荐： python - 如何替换python3中的has_key？

c# - 2 合 1 Visual Studio ？
我以前使用过像 Netbeans 和 eclipse 这样的 IDE。我在 friend 的电脑上下载了“Visual Studio Express 2013 for windows desktop
c - 将 PSRAM 写入 EZ Flash 3 合 1
我正在尝试弄清楚如何在 GBA 大小的 EZ Flash 3 合 1 卡中对 PSRAM 进行编程。基本上重复 GBA Exploader 和其他程序所做的事情。如果我选择一个 block 并对其进
python - 如何组合所有 3 合 1 re.findall() ??(python 2.7 && 正则表达式)
Filter1=re.findall(r'',PageSource) Filter2=re.findall(r'',PageSource) Filter3=re.findall(r'(.*?).*?'
ubuntu - 戴尔 XPS 13 9365 2 合 1 挂起挂起 Ubuntu 16.04
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。我们不允许在 Stack Overflow 上提出有关通用计算硬件和软件的问题。您可以编辑问题，使其成为

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 有没有一种方法可以将多个逻辑回归方程整合为一个？