gpt4 book ai didi

python - 从零开始的逻辑回归 : error keeps increasing

转载 作者:行者123 更新时间:2023-12-05 05:30:04 25 4
gpt4 key购买 nike

我已经从头开始实现逻辑回归,但是当我运行脚本时,算法总是预测错误的标签。我尝试通过将所有 1 切换为 0 来更改训练输出和 test_output,反之亦然,但它总是预测错误的标签。
我还注意到,将“-”符号更改为“+”,在更新权重和偏差时,脚本可以正确预测标签。
我做错了什么?
这是我写的代码:

# IMPORTS
import numpy as np

# HYPERPARAMETERS
EPOCHS = 1000
LEARNING_RATE = 0.1

# FUNCTIONS
def sigmoid(z):
return 1 / (1 + np.exp(-z))


def cost(y_pred, training_outputs, m):
j = - np.sum(training_outputs * np.log(y_pred) + (1 - training_outputs) * np.log(1 - y_pred)) / m
return j


# ENTRY
if __name__ == "__main__":

# Training input and output
x = np.array([[1, 1, 1], [0, 0, 0], [1, 0, 1]])
training_outputs = np.array([1, 0, 1])

# Test input and output
test_input = np.array([[0, 1, 1]])
test_output = np.array([0])

# Weigths
w = np.array([0.3, 0.3, 0.3])

# Biases
b = 0

m = 3

# Training
for iteration in range(EPOCHS):
print("Iteration n.", iteration, end= "\r")

# Compute log odds
z = np.dot(x, w) + b

# Compute predicted probability
y_pred = sigmoid(z)

# Back propagation
dz = y_pred - training_outputs
dw = np.dot(x, dz) / m
db = np.sum(dz) / m

# Update weights and bias according to the gradient descent algorithm
w = w - LEARNING_RATE * dw
b = b - LEARNING_RATE * db

print("Model trained. Proceeding with model evaluation...")

# Test
# Compute log odds
z = np.dot(test_input, w) + b

# Compute predicted probability
y_pred = sigmoid(z)
print(y_pred)

# Compute cost
cost = cost(y_pred, test_output, m)

print(cost)

最佳答案

@J_H 指出了一个不正确的假设:

>>> from sklearn.linear_model import LogisticRegression
>>> import numpy as np
>>> x = np.array([[1, 1, 1], [0, 0, 0], [1, 0, 1]])
>>> y = np.array([1, 0, 1])
>>> clf = LogisticRegression().fit(x, y)
>>> clf.predict([[0, 1, 1]])
array([1])

scikit-learn 似乎认为 test_output 应该是 1 而不是 0

还有一些建议:

  • m 应该可以删除(它是一个常量,因此可以包含在 LEARNING_RATE 中)
  • w 的初始化应与 x 中的列数成比例(即 x.shape[1])
  • dw = np.dot(x, dz) 应该是 np.dot(dz, x)
  • 逻辑回归中的预测取决于阈值,通常是 0.5

考虑到这一点将如下所示。

# Initialize weights and bias
w, b = np.zeros(X.shape[1]), 0

for _ in range(EPOCHS):
# Compute log odds
z = np.dot(x, w) + b

# Compute predicted probability
y_pred = sigmoid(z)

# Back propagation
dz = y_pred - training_outputs
dw = np.dot(dz, x)
db = np.sum(dz)

# Update
w = w - LEARNING_RATE * dw
b = b - LEARNING_RATE * db

# Test
z = np.dot(test_input, w) + b
test_pred = sigmoid(z) >= 0.5
print(test_pred)

使用 sklearn.datasets.make_classification 创建的随机训练/测试集的完整示例可能看起来像这样——通常在 scikit-learn 实现以及:

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np

EPOCHS = 100
LEARNING_RATE = 0.01

def sigmoid(z):
return 1 / (1 + np.exp(-z))

if __name__ == "__main__":

X, y = make_classification(n_samples=1000, n_features=5)
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Initialize `w` and `b`
w, b = np.zeros(X.shape[1]), 0

for _ in range(EPOCHS):
z = np.dot(X_train, w) + b
y_pred = sigmoid(z)
dz = y_pred - y_train
dw = np.dot(dz, X_train)
db = np.sum(dz)
w = w - LEARNING_RATE * dw
b = b - LEARNING_RATE * db

# Test
z = np.dot(X_test, w) + b
test_pred = sigmoid(z) >= 0.5
print(accuracy_score(y_test, test_pred))

关于python - 从零开始的逻辑回归 : error keeps increasing,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74804108/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com