python - Sklearn NN 回归出勤预测-6ren

python - Sklearn NN 回归出勤预测

转载作者：行者123 更新时间：2023-12-01 09:20:14

我之前就同一个问题提出了一个问题，但由于我的方法发生了变化，我现在有不同的问题。

我当前的代码:

from sklearn import preprocessing
from openpyxl import load_workbook
import numpy as np
from numpy import exp, array, random, dot
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report,confusion_matrix
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
#Set sizes
rowSize = 200
numColumns = 4

# read  from excel file
wb = load_workbook('python_excel_read.xlsx')
sheet_1 = wb["Sheet1"]

date = np.zeros(rowSize)
day = np.zeros(rowSize)
rain = np.zeros(rowSize)
temp = np.zeros(rowSize)
out = np.zeros(rowSize)

for i in range(0, rowSize):
    date[i] = sheet_1.cell(row=i + 1, column=1).value
    day[i] = sheet_1.cell(row=i + 1, column=2).value
    rain[i] = sheet_1.cell(row=i + 1, column=3).value
    temp[i] = sheet_1.cell(row=i + 1, column=4).value
    out[i] = sheet_1.cell(row=i + 1, column=5).value

train = np.zeros(shape=(rowSize,numColumns))
t_o = np.zeros(shape=(rowSize,1))

for i in range(0, rowSize):
    train[i] = [date[i], day[i], rain[i], temp[i]]
    t_o[i] = [out[i]]


X = train
# Output
y = t_o

X_train, X_test, y_train, y_test = train_test_split(X, y)

####Neural Net
nn = MLPRegressor(
    hidden_layer_sizes=(3,),  activation='relu', solver='adam', alpha=0.001, batch_size='auto',
    learning_rate='constant', learning_rate_init=0.01, power_t=0.5, max_iter=10000, shuffle=True,
    random_state=9, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True,
    early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
nn.fit(X_train, y_train.ravel())


y_pred = nn.predict(X_test)

###Linear Regression
# lm = LinearRegression()
# lm.fit(X_train,y_train)
# y_pred = lm.predict(X_test)

fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(X_test[:,0], y_pred, s=1, c='b', marker="s", label='real')
ax1.scatter(X_test[:,0], y_test, s=10, c='r', marker="o", label='NN Prediction')
plt.show()

#Calc MSE
mse = np.square(y_test-y_pred).mean()

print(mse)

结果显示测试数据的预测非常糟糕。因为我对此很陌生，所以我不确定这是我的数据、模型还是我的编码。根据该图，我认为该模型对于数据来说是错误的(该模型似乎预测了接近线性或平方的数据，而实际数据似乎更加分散)

以下是一些数据点:格式为一年中的某一天(2 是 1 月 2 日)、工作日(1)/周末(0)、下雨(1)/无雨(0)、F 温度、出勤(这是输出)

2   0   0   51  1366
4   0   0   62  538
5   1   0   71  317
6   1   0   76  174
7   1   0   78  176
8   1   0   68  220
12  1   1   64  256
13  1   1   60  379
14  1   0   64  316
18  0   0   72  758
19  1   0   72  1038
20  1   0   72  405
21  1   0   71  326
24  0   0   74  867
26  1   1   68  521
27  1   0   71  381
28  1   0   72  343
29  1   1   68  266
30  0   1   57  479
31  0   1   57  717
33  1   0   70  542
34  1   0   73  220
35  1   0   74  360
36  1   0   79  444
42  1   0   78  534
45  0   0   80  1572
52  0   0   76  1236
55  1   1   64  689
56  1   0   69  726
59  0   0   67  1188
60  0   0   74  1140
61  1   1   63  979
62  1   1   62  657
63  1   0   67  687
64  1   0   72  615
67  0   0   80  1074
68  1   0   81  1261
71  1   0   83  1332
73  0   0   85  1259
74  0   0   86  1142
76  1   0   88  1207
77  1   1   78  1438
82  1   0   85  1251
83  1   0   83  1019
85  1   0   86  1178
86  0   0   92  1306
87  0   0   92  1273
89  1   0   93  1101
90  1   0   92  1274
93  0   0   83  1548
94  0   0   86  1318
96  1   0   83  1395
97  1   0   81  1338
98  1   0   75  1240
100 0   0   84  1335
102 0   0   83  931
103 1   0   87  746
104 1   0   91  746
105 1   0   81  600
106 1   0   72  852
108 0   1   87  1204
109 0   0   89  1191
110 1   0   90  769
111 1   0   88  642
112 1   0   86  743
114 0   1   75  1085
115 0   1   78  1109
117 1   0   84  871
120 1   0   96  599
123 0   0   93  651
129 0   0   74  1325
133 1   0   88  637
134 1   0   84  470
135 0   1   73  980
136 0   0   72  1096
137 0   0   83  792
138 1   0   87  565
139 1   0   84  501
141 1   0   88  615
142 0   0   79  722
143 0   0   80  1363
144 0   0   82  1506
146 1   0   93  626
147 1   0   94  415
148 1   0   95  596
149 0   0   100 532
150 0   0   102 784
154 1   0   99  514
155 1   0   94  495
156 0   1   87  689
157 0   1   94  931
158 0   0   97  618
161 1   0   92  451
162 1   0   97  574
164 0   0   102 898
165 0   0   104 746
166 1   0   109 587
167 1   0   109 465
174 1   0   108 514
175 1   0   109 572
179 0   0   107 811
181 1   0   104 423
182 1   0   103 526
184 0   1   97  849
185 0   0   103 852
189 1   0   106 728
191 0   0   101 577
194 1   0   105 511
198 0   1   101 616
199 0   1   97  1056
200 0   0   94  740
202 1   0   103 498
205 0   0   101 610
206 0   0   106 944
207 0   0   105 769
208 1   0   103 551
209 1   0   103 624
210 1   0   97  513
212 0   1   107 561
213 0   0   100 905
214 0   0   105 767
215 1   0   107 510
216 1   0   108 406
217 1   0   109 439
218 1   0   103 427
219 0   1   104 460
224 1   0   105 213
227 0   0   112 834
228 0   0   109 615
229 1   0   105 216
230 1   0   104 213
231 1   0   104 256
232 1   0   104 282
235 0   0   104 569
238 1   0   103 165
239 1   1   105 176
241 0   1   108 727
242 0   1   105 652
243 1   1   103 231
244 1   0   96  117
245 1   1   98  168
246 1   1   97  113
247 0   0   95  227
248 0   0   92  1050
249 0   0   101 1274
250 1   1   95  1148
254 0   0   99  180
255 0   0   104 557
258 1   0   94  228
260 1   0   95  133
263 0   0   100 511
264 1   1   89  249
265 1   1   90  245
267 1   0   101 390
272 1   0   100 223
273 1   0   103 194
274 1   0   103 150
275 0   0   95  224
276 0   0   92  705
277 0   1   92  504
279 1   1   77  331
281 1   0   89  268
284 0   0   95  566
285 1   0   94  579
286 1   0   95  420
288 1   0   93  392
289 0   1   94  525
290 0   1   86  670
291 0   1   89  488
294 1   1   74  295
296 0   0   81  314
299 1   0   88  211
301 1   0   84  246
303 0   1   76  433
304 0   0   80  216
307 1   1   80  275
308 1   1   66  319
312 0   0   80  413
313 1   0   78  278
316 1   0   74  305
320 1   1   57  323
324 0   0   76  220
326 0   0   77  461
327 1   0   78  510
331 0   0   60  1701
334 1   0   58  237
335 1   0   62  355
336 1   0   68  266
338 0   0   70  246
342 1   0   72  109
343 1   0   70  103
347 0   0   58  486
349 1   0   52  144
350 1   0   53  209
351 1   0   55  289
354 0   0   62  707
355 1   0   59  903
359 0   0   58  481
360 0   0   53  1342
364 1   0   57  1624

我总共有超过一千个数据点，但我没有将它们全部用于训练/测试。一种想法是我需要更多，另一种想法是我需要更多因素，因为温度/降雨/星期几对出勤率的影响不够。

剧情如下:

我该怎么做才能使我的模型更准确并提供更好的预测？

谢谢

编辑:我添加了更多数据点和另一个因素。我似乎无法上传 Excel 文件，因此我将数据放在这里，并更好地解释其格式

编辑:这是最新的代码:

from sklearn import preprocessing
from openpyxl import load_workbook
import numpy as np
from numpy import exp, array, random, dot
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report,confusion_matrix
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_predict
from sklearn import svm
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import LeaveOneOut
#Set sizes
rowSize = 500
numColumns = 254

# read  from excel file
wb = load_workbook('python_excel_read.xlsx')
sheet_1 = wb["Sheet1"]

input = np.zeros(shape=(rowSize,numColumns))
out = np.zeros(rowSize)
for i in range(0, rowSize):
    for j in range(0,numColumns):
        input[i,j] = sheet_1.cell(row=i + 1, column=j+1).value
    out[i] = sheet_1.cell(row=i + 1, column=numColumns+1).value

output = np.zeros(shape=(rowSize,1))

for i in range(0, rowSize):
    output[i] = [out[i]]


X = input
# Output
y = output

print(X)
print(y)
y[y < 500] = 0
y[np.logical_and(y >= 500, y <= 1000)] = 1
y[np.logical_and(y > 1000, y <= 1200)] = 2
y[y > 1200] = 3

# Use cross-validation
#kf = KFold(n_splits = 10, random_state=0)
loo = LeaveOneOut()
# Try different models
clf = svm.SVC()
scaler = StandardScaler()
pipe = Pipeline([('scaler', scaler), ('svc', clf)])

accuracy = cross_val_score(pipe, X, y.ravel(), cv = loo, scoring = "accuracy")
print(accuracy.mean())

#y_pred = cross_val_predict(clf, X, y.ravel(), cv = kf)
#cm = confusion_matrix(y, y_pred)

这是最新的数据，其中包含我可以添加的尽可能多的功能。请注意，这是完整数据中的随机样本:

Link to sample data

当前输出:0.6230954290296712

我的最终目标是达到 90% 或更高的准确率...我不相信我能找到更多的功能，但如果有帮助，我会继续收集尽可能多的功能

最佳答案

你的问题很笼统，但我有一些建议。您可以使用交叉验证并尝试不同的模型。就我个人而言，我会尝试 SVR、RandomForests，最后选择我会使用 MLPR。

我修改了你的代码以显示一个简单的示例:

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report,confusion_matrix
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_predict
from sklearn import svm
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import LeaveOneOut
import pandas as pd
from sklearn.decomposition import PCA

# read the data
df = pd.read_excel('python_excel_read.xlsx', header = None)
rows, cols = df.shape

X = df.iloc[: , 0:(cols - 1)]
y = df.iloc[: , cols - 1 ]
print(X.shape)
print(y.shape)

y[y < 500] = 0
y[np.logical_and(y >= 500, y <= 1000)] = 1
y[np.logical_and(y > 1000, y <= 1200)] = 2
y[y > 1200] = 3
print(np.unique(y))

# We can apply PCA to reduce the dimensions of the data
# pca = PCA(n_components=2)
# pca.fit(X)
# X = pca.fit_transform(X)

# Use cross-validation
#kf = KFold(n_splits = 10, random_state=0)
loo = LeaveOneOut()
# Try different models
clf = svm.SVC(kernel = 'linear')
scaler = StandardScaler()
pipe = Pipeline([('scaler', scaler), ('svc', clf)])

accuracy = cross_val_score(pipe, X, y.ravel(), cv = loo, scoring = "accuracy")
print(accuracy.mean())

#y_pred = cross_val_predict(clf, X, y.ravel(), cv = kf)
#cm = confusion_matrix(y, y_pred)

关于python - Sklearn NN 回归出勤预测，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50847386/

文章推荐： java - 我如何在JAVA中使用SignalR

文章推荐： java - 为什么使用ArrayBlockingQueue时程序会挂起

文章推荐： java - 如何从现有的 jar 文件创建 jar 文件？

文章推荐： java - GPG 从 Java 解密平面文件

python - Python 中的集群或合并集群以减少组数 (Python)
我正在处理一组标记为 160 个组的 173k 点。我想通过合并最接近的(到 9 或 10 个组)来减少组/集群的数量。我搜索过 sklearn 或类似的库，但没有成功。我猜它只是通过 knn 聚类
python - python 列表的子集基于同一列表的元素组，pythonically
我有一个扁平数字列表，这些数字逻辑上以 3 为一组，其中每个三元组是 (number, __ignored, flag[0 or 1])，例如: [7,56,1, 8,0,0, 2,0,0, 6,1,
python - 激活 Python 虚拟环境并在另一个 Python 脚本中调用 Python 脚本
我正在使用 pipenv 来管理我的包。我想编写一个 python 脚本来调用另一个使用不同虚拟环境(VE)的 python 脚本。如何运行使用 VE1 的 python 脚本 1 并调用另一个 p
python - 在焕然一新的 Python 环境中以编程方式从 Python 内部执行 Python 文件
假设我有一个文件 script.py 位于 path = "foo/bar/script.py"。我正在寻找一种在 Python 中通过函数 execute_script() 从我的主要 Python
python - 从 python 脚本但在 python 脚本之外运行 python 脚本
这听起来像是谜语或笑话，但实际上我还没有找到这个问题的答案。问题到底是什么？我想运行 2 个脚本。在第一个脚本中，我调用另一个脚本，但我希望它们继续并行，而不是在两个单独的线程中。主要是我不希望第
python - 使用不同的 python 从 python 运行 python 脚本
我有一个带有 python 2.5.5 的软件。我想发送一个命令，该命令将在 python 2.7.5 中启动一个脚本，然后继续执行该脚本。我试过用 #!python2.7.5 和http://re
python - 为什么从 Python 命令行调用 Python 时 Python 无法找到并运行我的脚本？
我在 python 命令行(使用 python 2.7)中，并尝试运行 Python 脚本。我的操作系统是 Windows 7。我已将我的目录设置为包含我所有脚本的文件夹，使用: os.chdir("
python - 使用动态版本的 Python 执行嵌入的 Python 代码时出现致命的 Python 错误
剧透:部分解决(见最后)。以下是使用 Python 嵌入的代码示例: #include int main(int argc, char** argv) { Py_SetPythonHome
python - python 中识别 python 数组或列表中最大累积差异的最快方法是什么？
假设我有以下列表，对应于及时的股票价格: prices = [1, 3, 7, 10, 9, 8, 5, 3, 6, 8, 12, 9, 6, 10, 13, 8, 4, 11] 我想确定以下总体上最
python - (Python) 通过单选按钮 python 更新背景
所以我试图在选择某个单选按钮时更改此框架的背景。我的框架位于一个类中，并且单选按钮的功能位于该类之外。 (这样我就可以在所有其他框架上调用它们。) 问题是每当我选择单选按钮时都会出现以下错误: co
python - python 中的字符串与正则表达式比较在 python 中失败
我正在尝试将字符串与 python 中的正则表达式进行比较，如下所示， #!/usr/bin/env python3 import re str1 = "Expecting property name
python - python 如何加载Boost.Python 库？
考虑以下原型(prototype) Boost.Python 模块，该模块从单独的 C++ 头文件中引入类“D”。 /* file: a/b.cpp */ BOOST_PYTHON_MODULE(c)
python - python 检查模块 python 的问题
如何编写一个程序来“识别函数调用的行号？” python 检查模块提供了定位行号的选项，但是， def di(): return inspect.currentframe().f_back.f_l
python - 系统 python 与用户 python
我已经使用 macports 安装了 Python 2.7，并且由于我的 $PATH 变量，这就是我输入 $ python 时得到的变量。然而，virtualenv 默认使用 Python 2.6，除
python - [Python] : Python re. 长字符串行的搜索速度优化
我只想问如何加快 python 上的 re.search 速度。我有一个很长的字符串行，长度为 176861(即带有一些符号的字母数字字符)，我使用此函数测试了该行以进行研究: def getExe
python - 编辑字符串 python 正则表达式 python
list1= [u'%app%%General%%Council%', u'%people%', u'%people%%Regional%%Council%%Mandate%', u'%ppp%%Ge
python - Python 映射中的副作用(Python "do" block )
这个问题在这里已经有了答案: Is it Pythonic to use list comprehensions for just side effects? (7 个答案) 关闭 4 个月前。告
python - 使用其值逻辑组合两个 python 列表 - Python
我想用 Python 将两个列表组合成一个列表，方法如下: a = [1,1,1,2,2,2,3,3,3,3] b= ["Sun", "is", "bright", "June","and" ,"Ju
python - Boost.Python python 链接错误
我正在运行带有最新 Boost 发行版 (1.55.0) 的 Mac OS X 10.8.4 (Darwin 12.4.0)。我正在按照说明 here构建包含在我的发行版中的教程 Boost-Pyth
python - 在 Python 中仅使用内置库制作一个基本的网络抓取工具 - Python
学习 Python，我正在尝试制作一个没有任何第 3 方库的网络抓取工具，这样过程对我来说并没有简化，而且我知道我在做什么。我浏览了一些在线资源，但所有这些都让我对某些事情感到困惑。 html 看起来

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城