I'm preprocessing the Kaggle Titanic dataset using a FeatureSpace
and I'm running into issues with the adapt()
function. When trying to join my dataset with the FeatureSpace I've made,
我正在使用Featuspace来预处理Kaggle泰坦尼克号数据集,并且我遇到了Adapt()函数的问题。当尝试将我的数据集与我创建的Featuspace连接时,
train_ds_with_no_labels = train_ds.map(lambda x, _: x)
feature_space.adapt(train_ds_with_no_labels)
I get these errors
我得到了这些错误
File /opt/conda/lib/python3.10/site-packages/keras/utils/feature_space.py:531, in FeatureSpace.adapt.<locals>.<lambda>(x)
518 raise ValueError(
519 "`adapt()` can only be called on a tf.data.Dataset. "
520 f"Received instead: {dataset} (of type {type(dataset)})"
521 )
523 for name in self._list_adaptable_preprocessors():
524 # Call adapt() on each individual adaptable layer.
525
(...)
528 # and call the layer's `_adapt_function` on each batch
529 # to simulate the behavior of adapt() in a more performant fashion.
--> 531 feature_dataset = dataset.map(lambda x: x[name])
532 preprocessor = self.preprocessors[name]
KeyError: 'sibsp'
and this takes us to
这将把我们带到
518 raise ValueError(
519 "`adapt()` can only be called on a tf.data.Dataset. "
520 f"Received instead: {dataset} (of type {type(dataset)})"
521 )
But when I did type(dataset)
, I get <class 'tensorflow.python.data.ops.map_op._MapDataset'>
which is exactly what the code I'm following has.
但是当我输入(DataSet)时,我得到<‘tensorflow.python.data.ops.map_op._MapDataset’>类,这正是我所遵循的代码所具有的。
The train_with_no_labels
that I pass into adapt looks like this (just one sample)
我传递给Adapter的Train_With_no_Labels如下所示(只有一个示例)
{'PassengerId': <tf.Tensor: shape=(), dtype=int64, numpy=80>, 'Pclass': <tf.Tensor: shape=(), dtype=int64, numpy=3>, 'Name': <tf.Tensor: shape=(), dtype=string, numpy=b'Dowdell, Miss. Elizabeth'>, 'Sex': <tf.Tensor: shape=(), dtype=string, numpy=b'female'>, 'Age': <tf.Tensor: shape=(), dtype=float64, numpy=30.0>, 'SibSp': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'Parch': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'Ticket': <tf.Tensor: shape=(), dtype=string, numpy=b'364516'>, 'Fare': <tf.Tensor: shape=(), dtype=float64, numpy=12.475>} <class 'dict'>
and my FeatureSpace
looks like this
我的特征是这样的
from keras.utils import FeatureSpace
feature_space = FeatureSpace(
features={
# Categorical features encoded as integers
"sibsp": FeatureSpace.integer_categorical(num_oov_indices=0),
"parch": FeatureSpace.integer_categorical(num_oov_indices=0),
"pclass": FeatureSpace.integer_categorical(num_oov_indices=0),
# Categorical feature encoded as string
"sex": FeatureSpace.string_categorical(num_oov_indices=0),
"embarked": FeatureSpace.string_categorical(num_oov_indices=0),
# Numerical features to discretize
"Age": FeatureSpace.float_discretized(num_bins=30),
# Numerical features to normalize
"fare": FeatureSpace.float_normalized(),
},
output_mode="concat",
)
Here's the notebook I've been working on and here's the dataset and further info about it. I'm very fresh to this but have been stuck for several days now. I've tried everything I can think of and searched all over online. Such a weird error, I'm sure I'm missing something obvious but I can't see it. Are there better ways of going about this?
这是我一直在做的笔记本,这是数据集和有关它的更多信息。我对此非常陌生,但现在已经被困了好几天了。我试了我能想到的所有方法,并在网上到处搜索。如此奇怪的错误,我确信我遗漏了一些明显的东西,但我看不到它。有没有更好的办法来解决这个问题?
更多回答
我是一名优秀的程序员,十分优秀!