gpt4 book ai didi

python - 为什么这个线性分类器算法是错误的?

转载 作者:塔克拉玛干 更新时间:2023-11-03 04:29:27 25 4
gpt4 key购买 nike

我指定了“n”个点数。将它们标记为 +1-1。我将所有这些存储在字典中,如下所示:{'point1' : [(0.565,-0.676), +1], ... }。我试图找到一条将它们分开的线 - 即线上方标记为 +1 的点,线下方标记为 -1 的点。谁能帮忙?

我正在尝试应用 w = w + y(r) 作为“学习算法”,w 是权重向量 y+1-1r为点

代码运行但分隔线不精确 - 它没有正确分隔。此外,随着我​​增加要分离的点数,该线的效率会降低。

如果您运行代码,绿线应该是分隔线。它越接近蓝线(定义为完美线)的斜率越好。

from matplotlib import pyplot as plt
import numpy as np
import random

n = 4
x_values = [round(random.uniform(-1,1),3) for _ in range(n)]
y_values = [round(random.uniform(-1,1),3) for _ in range(n)]
pts10 = zip(x_values, y_values)
label_dict = {}


x1, y1, x2, y2 = (round(random.uniform(-1,1),3) for _ in range(4))
b = [x1, y1]
d = [x2, y2]
slope, intercept = np.polyfit(b, d, 1)

fig, ax = plt.subplots(figsize=(8,8))
ax.scatter(*zip(*pts10), color = 'black')
ax.plot(b,d,'b-')

label_plus = '+'
label_minus = '--'
i = 1
for x,y in pts10:
if(y > (slope*x + intercept)):
ax.annotate(label_plus, xy=(x,y), xytext=(0, -10), textcoords='offset points', color = 'blue', ha='center', va='center')
label_dict['point{}'.format(i)] = [(x,y), "+1"]
else:
ax.annotate(label_minus, xy=(x,y), xytext=(0, -10), textcoords='offset points', color = 'red', ha='center', va='center')
label_dict['point{}'.format(i)] = [(x,y), "-1"]
i += 1



# this is the algorithm
def check(ww,rr):
while(np.dot(ww,rr) >= 0):
print "being refined 1"
ww = np.subtract(ww,rr)
return ww
def check_two(ww,rr):
while(np.dot(ww,rr) < 0):
print "being refined 2"
ww = np.add(ww,rr)
return ww

w = np.array([0,0])
ii = 1
for x,y in pts10:
r = np.array([x,y])
print w
if (np.dot(w,r) >= 0) != int(label_dict['point{}'.format(ii)][1]) < 0:
print "Point " + str(ii) + " should have been below the line"
w = np.subtract(w,r)
w = check(w,r)
elif (np.dot(w,r) < 0) != int(label_dict['point{}'.format(ii)][1]) >= 0:
print "Point " + str(ii) + " should have been above the line"
w = np.add(w,r)
w = check_two(w,r)
else:
print "Point " + str(ii) + " is in the correct position"
ii += 1

ax.plot(w,'g--')


ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
ax.set_title('Labelling 10 points')
ax.set_xticks(np.arange(-1, 1.1, 0.2))
ax.set_yticks(np.arange(-1, 1.1, 0.2))
ax.set_xlim(-1, 1)
ax.set_ylim(-1, 1)
ax.legend()

最佳答案

例如,您可以使用 SGDClassifier来自 scikit-learn (sklearn)。线性分类器计算预测如下(参见 the source code ):

def predict(self, X):
scores = self.decision_function(X)
if len(scores.shape) == 1:
indices = (scores > 0).astype(np.int)
else:
indices = scores.argmax(axis=1)
return self.classes_[indices]

哪里decision_function由:

def decision_function(self, X):
[...]

scores = safe_sparse_dot(X, self.coef_.T,
dense_output=True) + self.intercept_
return scores.ravel() if scores.shape[1] == 1 else scores

因此对于您的示例的二维情况,这意味着数据点被分类为 +1 如果

x*w1 + y*w2 + i > 0

在哪里

x, y = X
w1, w2 = self.coef_
i = self.intercept_

-1 否则。因此,决定取决于 x*w1 + y*w2 + i 是否大于或小于(或等于)零。因此,通过设置 x*w1 + y*w2 + i == 0 可以找到“边界”。我们可以自由选择其中一个组件,另一个由这个等式决定。

以下代码片段适用于 SGDClassifier 并绘制生成的“边界”。它假设数据点散布在原点 (x, y = 0, 0) 周围,即它们的平均值(大约)为零。实际上,为了获得好的结果,应该先将数据点减去均值,然后进行拟合,再将均值加回结果。以下代码段只是将点散布在原点周围。

import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import SGDClassifier

n = 100
x = np.random.uniform(-1, 1, size=(n, 2))

# We assume points are scatter around zero.
b = np.zeros(2)
d = np.random.uniform(-1, 1, size=2)
slope, intercept = (d[1] / d[0]), 0.

fig, ax = plt.subplots(figsize=(8,8))
ax.scatter(x[:, 0], x[:, 1], color = 'black')
ax.plot([b[0], d[0]], [b[1], d[1]], 'b-', label='Ideal')

labels = []
for point in x:
if(point[1] > (slope * point[0] + intercept)):
ax.annotate('+', xy=point, xytext=(0, -10), textcoords='offset points', color = 'blue', ha='center', va='center')
labels.append(1)
else:
ax.annotate('--', xy=point, xytext=(0, -10), textcoords='offset points', color = 'red', ha='center', va='center')
labels.append(-1)

labels = np.array(labels)
classifier = SGDClassifier()
classifier.fit(x, labels)

x1 = np.random.uniform(-1, 1)
x2 = (-classifier.intercept_ - x1 * classifier.coef_[0, 0]) / classifier.coef_[0, 1]

ax.plot([0, x1], [0, x2], 'g--', label='Fit')

plt.legend()
plt.show()

此图显示了 n = 100 数据点的结果:

Result for n=100

下图显示了不同 n 的结果,其中点是从包含 1000 个数据点的池中随机选择的:

Results for different n

关于python - 为什么这个线性分类器算法是错误的?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45256237/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com