gpt4 book ai didi

python - Scikit-learn 逻辑回归的性能比用 Python 自行编写的逻辑回归要差

转载 作者:行者123 更新时间:2023-11-30 08:53:39 24 4
gpt4 key购买 nike

我已经用 python 编写了逻辑回归代码,并将其结果与 Scikit-learn 的逻辑回归进行了比较。后来在简单的一维样本数据上表现更差,如下所示:

我的物流

import pandas as pd

import numpy as np

def findProb(xBias, beta):

z = []
for i in range(len(xBias)):
z.append(xBias.iloc[i,0]*beta[0] + xBias.iloc[i,1]*beta[1])
prob = [(1/(1+np.exp(-i))) for i in z]
return prob

def calDerv(xBias, y, beta, prob):

derv = []
for i in range(len(beta)):
helpVar1 = 0
for j in range(len(xBias)):
helpVar2 = prob[j]*xBias.iloc[j,i] - y[j]*xBias.iloc[j,i]
helpVar1 = helpVar1 + helpVar2
derv.append(helpVar1/len(xBias))
return derv

def updateBeta(beta, alpha, derv):

for i in range(len(beta)):
beta[i] = beta[i] - derv[i]*alpha
return beta

def calCost(y, prob):

cost = 0
for i in range(len(y)):
if y[i] == 1: eachCost = -y[i]*np.log(prob[i])
else: eachCost = -(1-y[i])*np.log(1-prob[i])
cost = cost + eachCost
return cost

def myLogistic(x, y, alpha, iters):

beta = [0 for i in range(2)]
bias = [1 for i in range(len(x))]
xBias = pd.DataFrame({'bias': bias, 'x': x})
for i in range(iters):
prob = findProb(xBias, beta)
derv = calDerv(xBias, y, beta, prob)
beta = updateBeta(beta, alpha, derv)
return beta

比较小样本数据的结果

input = list(range(1, 11))

labels = [0,0,0,0,0,1,1,1,1,1]

print("\nmy logistic")

learningRate = 0.01

iterations = 10000

beta = myLogistic(input, labels, learningRate, iterations)

print("coefficients: ", beta)

print("decision boundary is at x = ", -beta[0]/beta[1])

decision = -beta[0]/beta[1]

predicted = [0 if i < decision else 1 for i in input]

print("predicted values: ", predicted)

输出:0, 0, 0, 0, 0, 1, 1, 1, 1, 1

print("\npython logistic")

from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()

input = np.reshape(input, (-1,1))

lr.fit(input, labels)

print("coefficient = ", lr.coef_)

print("intercept = ", lr.intercept_)

print("decision = ", -lr.intercept_/lr.coef_)

predicted = lr.predict(input)

print(predicted)

输出:0, 0, 0, 1, 1, 1, 1, 1, 1, 1

最佳答案

您的实现没有正则化项。 LinearRegression 估计器默认包含具有逆强度 C = 1.0 的正则化。当您将 C 设置为更高的值时,即削弱正则化,决策边界会更接近 5.5:

for C in [1.0, 1000.0, 1e+8]:
lr = LogisticRegression(C=C)
lr.fit(inp, labels)
print(f'C = {C}, decision boundary @ {(-lr.intercept_/lr.coef_[0])[0]}')

输出:

C = 1.0, decision boundary @ 3.6888430562595116
C = 1000.0, decision boundary @ 5.474229032805065
C = 100000000.0, decision boundary @ 5.499634348989383

关于python - Scikit-learn 逻辑回归的性能比用 Python 自行编写的逻辑回归要差,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50753400/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com