python - 用 keras 尝试 Kaggle Titanic .. 得到损失和 valid

python - 用 keras 尝试 Kaggle Titanic .. 得到损失和 valid_loss -0.0000

转载作者：行者123 更新时间：2023-11-28 18:36:52

25

4

您好，针对此处发布的问题 (https://www.kaggle.com/c/titanic)，我得到以下代码的奇怪结果 -

from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.advanced_activations import PReLU, LeakyReLU
from keras.layers.recurrent import SimpleRNN, SimpleDeepRNN
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM, GRU

import pandas as pd
import numpy as np 
from sklearn import preprocessing

np.random.seed(1919)

### Constants ###
data_folder = "/home/saj1919/Public/Data_Science_Mining_Study/submissions/titanic/data/"
out_folder = "/home/saj1919/Public/Data_Science_Mining_Study/submissions/titanic/output/"
batch_size = 4
nb_epoch = 10

### load train and test ###
train  = pd.read_csv(data_folder+'train.csv', index_col=0)
test  = pd.read_csv(data_folder+'test.csv', index_col=0)
print "Data Read complete"

Y = train.Survived
train.drop('Survived', axis=1, inplace=True)

columns = train.columns
test_ind = test.index

train['Age'] = train['Age'].fillna(train['Age'].mean())
test['Age'] = test['Age'].fillna(test['Age'].mean())
train['Fare'] = train['Fare'].fillna(train['Fare'].mean())
test['Fare'] = test['Fare'].fillna(test['Fare'].mean())

category_index = [0,1,2,4,5,6,8,9]
for i in category_index:
    print str(i)+" : "+columns[i]
    train[columns[i]] = train[columns[i]].fillna('missing')
    test[columns[i]] = test[columns[i]].fillna('missing')

train = np.array(train)
test = np.array(test)

### label encode the categorical variables ###
for i in category_index:
    print str(i)+" : "+str(columns[i])
    lbl = preprocessing.LabelEncoder()
    lbl.fit(list(train[:,i]) + list(test[:,i]))
    train[:,i] = lbl.transform(train[:,i])
    test[:,i] = lbl.transform(test[:,i])

### making data as numpy float ###
train = train.astype(np.float32)
test = test.astype(np.float32)
#Y = np.array(Y).astype(np.int32)

model = Sequential()
model.add(Dense(len(columns), 512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(512, 1))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer="adam")
model.fit(train, Y, nb_epoch=nb_epoch, batch_size=batch_size, validation_split=0.20)
preds = model.predict(test,batch_size=batch_size)

pred_arr = []
for pred in preds:
    pred_arr.append(pred[0])

### Output Results ###
preds = pd.DataFrame({"PassengerId": test_ind, "Survived": pred_arr})
preds = preds.set_index('PassengerId')
preds.to_csv(out_folder+'test.csv')

我得到以下结果:

Train on 712 samples, validate on 179 samples
Epoch 0
712/712 [==============================] - 0s - loss: -0.0000 - val_loss: -0.0000
Epoch 1
712/712 [==============================] - 0s - loss: -0.0000 - val_loss: -0.0000
Epoch 2
712/712 [==============================] - 0s - loss: -0.0000 - val_loss: -0.0000
Epoch 3
712/712 [==============================] - 0s - loss: -0.0000 - val_loss: -0.0000
Epoch 4
712/712 [==============================] - 0s - loss: -0.0000 - val_loss: -0.0000
Epoch 5
712/712 [==============================] - 0s - loss: -0.0000 - val_loss: -0.0000
Epoch 6
712/712 [==============================] - 0s - loss: -0.0000 - val_loss: -0.0000
Epoch 7
712/712 [==============================] - 0s - loss: -0.0000 - val_loss: -0.0000
Epoch 8
712/712 [==============================] - 0s - loss: -0.0000 - val_loss: -0.0000
Epoch 9
712/712 [==============================] - 0s - loss: -0.0000 - val_loss: -0.0000

我正在尝试创建一个简单的 3 层网络。完全基本的代码。在 kaggle 上使用 keras 之前，我已经尝试过这类分类问题。但是这次出现了这个错误。

是否由于数据较少而过度拟合。我缺少什么？有人可以帮忙吗？

最佳答案

旧帖子，但无论如何都要回答，以防其他人尝试使用 Keras 进行泰坦尼克号。

您的网络可能有太多参数和太少正则化(例如丢失)。

在 model.compile 之前调用 model.summary() ，它会显示你的网络有多少参数。在你的两个密集层之间，你应该有 512 X 512 = 262,144 个参数。这对于 762 个示例来说已经很多了。

您可能还想在最后一层使用 sigmoid 激活和 binary_cross 熵损失，因为您只有两个输出类。

关于python - 用 keras 尝试 Kaggle Titanic .. 得到损失和 valid_loss -0.0000，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31627380/

25

4

0

文章推荐： python - Paramiko ssh 死亡/挂起，输出很大

文章推荐： ios - 如何计算 NSString 中有多少字符直到分号

文章推荐： ios - QuartzCore 动画形状？

文章推荐： ios - 使用 GCD 计时器间隔播放声音不符合预期

titan - tinkerpop/titan 中使用什么机制来确定顶点的绝对顺序？
执行以下遍历时: graph.addVertex("a") graph.addVertex("b") graph.addVertex("c") graph.traversal().V().range(
titan - 如何排除 gremlin titan 中的某些顶点
例如，我想在查询时排除一些顶点 ID。第 1 步:我正在带领用户跟随我 (1234): g.V(1234).outE("following") 输出: 9876,3246,2343,3452,123
titan - 在 Titan/Janus 中启用力索引时索引失败
我写了一个 JUnit 测试来检查 generate-modern.groovy如果 marko 存在，则绘制图表。我的小鬼查询是 "g.V().has('name','marko')"; 正如您在
titan - 计算 Titan 上的 super 节点
在我的系统中，我要求节点上的边数必须存储为顶点上的内部属性以及特定输出边上的以顶点为中心的索引。这自然需要我在所有数据加载完成后计算节点上的边数。我这样做如下: long edgeCount = gr
titan - 如何使用 Gremlin/Titan/TinkerPop3 更新特定的边属性？
目标我有一个足够简单的任务需要完成:设置特定边属性的权重。以这个场景为例: 我想做的是更新 weight 的值. 附加要求如果边不存在，则应创建它。两个节点之间最多只能存在一条相同类型的边(即，
titan - 无法删除/删除Vertex Titan 1.0 Tinkerpop 3上的多个属性键
一个非常基本的问题我刚刚将Titan从0.54升级到Titan 1.0 Hadoop 1/TP3版本3.01。我在删除的值时遇到问题 Property key: Cardinality.LIST/
elasticsearch - titan- elasticseach和com.thinkaurelius.titan.diskstorage.es.ElasticSearchIndex
我是泰坦的新手。我正在使用ubuntu 12.04并使用从deb软件包安装的cassandra 2.0和ES 1.0。我试图让ES与titan一起工作。 gremlin> g = TitanF
titan - titan 1.0.0 中的 gremlin 并未在 Windows 上开箱即用
我正在关注 http://s3.thinkaurelius.com/docs/titan/1.0.0/getting-started.html我的 Windows 机器上的指南。但我在第一步就陷入困
titan - 使用 TinkerPop 3 Restful 接口(interface)连接到 Titan Graphdb
我对 Titan/Gremlin/TinkerPop3 非常陌生，正在尝试使用 Restful API 接口(interface)来创建/修改/删除数据库中的顶点/边。我主要想看看是否可以使用tin
graph - 将Rexster(titan 0.4)查询迁移到Gremlin Server(Titan 1.0)查询
我正在使用以下Rexster查询: string gremlinQuery = "g.v(" + companyVertexId + ").transform{[salaryInfo:it.outE
titan - 将 Titan-1.0.0 迁移到 janusgraph-0.1.1
我正在使用 Titan-1.0.0，我想将其迁移到 janusgraph-0.1.1。我需要做哪些配置更改？我还想使用 JanusGraph 重用 Titan 的数据。最佳答案从 Titan 迁移
titan - 如何去除两个顶点之间的边？
我想删除两个顶点之间的边，所以我在 java tinkerpop3 中的代码如下 private void removeEdgeOfTwoVertices(Vertex fromV, Vertex t
titan - 如何在两个当前连接的顶点之间插入一个顶点？
我正在尝试掌握 Gremlin。彻底阅读文档后，我似乎仍然在概念上苦苦挣扎。我正在创建一个基本的新闻源，遵循此处 Neo4j 文档中的模型: http://neo4j.com/docs/snapsh
Titan:添加新顶点或边后刷新图形
我在 Spring Web 应用程序中使用 Titan。当我添加新边或删除一个顶点时，它实际上已删除但未反射(reflect)在我的网页中。我的查询仍然得到旧的边或顶点。为此，我必须重新启动我的应用
titan - 图遍历中的gremlin查询if-else-then
我一直试图在 Gremlin 查询中找出这个 if-else。假设 g.V({0}) 是下面的群顶点。 var q = "g.V({0}).as('groupName', 'groupId',
Titan Db 忽略索引
我有一个带有几个索引的图表。它们是两个带有标签限制的复合指数。 (两者在不同的属性/标签上完全相同)。一个似乎确实有效，但另一个无效。我已经完成了以下 profile() 以进行双重检查: 一个叫K
titan - Gremlin - 如何在不明确列出属性的情况下合并顶点以组合它们的属性？
背景:我正在尝试使用 this approach 实现一个时间序列版本数据库，使用 gremlin (tinkerpop v3)。我想获取给定身份节点(蓝色)的最新状态节点(红色)(由包含时间戳范围
titan - Gremlin:找到两个顶点之间的边的有效方法是什么？
很明显，在两个顶点之间找到边的简单方法是: graph.traversal().V(outVertex).bothE(edgeLabel).filter(__.otherV().is(inVertex
titan - 在TinkerPop中next()意味着什么
我目前正在阅读TinkerPop3 Documentation 我感到困惑的是，我找不到关于next()的任何解释。例如，w/next()或w/o next()返回相同的文字 gremlin> g.
titan - 使用TitanDB是否为 "safe"？
在DataStax收购Aurelius之后，自从Titan 1.0.0在2015年9月发布以来，自那时以来几乎没有提交过，所以我想知道在生产中使用TitanDB是否安全。谁能给我一些见识？最佳答案

首页

博学

6Ren·AI

商城

python - 用 keras 尝试 Kaggle Titanic .. 得到损失和 valid_loss -0.0000