I'm preprocessing the Kaggle Titanic dataset using a FeatureSpace
and I'm running into issues with the adapt()
function. When trying to join my dataset with the FeatureSpace I've made,
train_ds_with_no_labels = train_ds.map(lambda x, _: x)
I get these errors
File /opt/conda/lib/python3.10/site-packages/keras/utils/feature_space.py:531, in FeatureSpace.adapt.<locals>.<lambda>(x)
518 raise ValueError(
519 "`adapt()` can only be called on a tf.data.Dataset. "
520 f"Received instead: {dataset} (of type {type(dataset)})"
521 )
523 for name in self._list_adaptable_preprocessors():
524 # Call adapt() on each individual adaptable layer.
528 # and call the layer's `_adapt_function` on each batch
529 # to simulate the behavior of adapt() in a more performant fashion.
--> 531 feature_dataset = dataset.map(lambda x: x[name])
532 preprocessor = self.preprocessors[name]
KeyError: 'sibsp'
and this takes us to
518 raise ValueError(
519 "`adapt()` can only be called on a tf.data.Dataset. "
520 f"Received instead: {dataset} (of type {type(dataset)})"
521 )
But when I did type(dataset)
, I get <class 'tensorflow.python.data.ops.map_op._MapDataset'>
which is exactly what the code I'm following has.
The train_with_no_labels
that I pass into adapt looks like this (just one sample)
{'PassengerId': <tf.Tensor: shape=(), dtype=int64, numpy=80>, 'Pclass': <tf.Tensor: shape=(), dtype=int64, numpy=3>, 'Name': <tf.Tensor: shape=(), dtype=string, numpy=b'Dowdell, Miss. Elizabeth'>, 'Sex': <tf.Tensor: shape=(), dtype=string, numpy=b'female'>, 'Age': <tf.Tensor: shape=(), dtype=float64, numpy=30.0>, 'SibSp': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'Parch': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'Ticket': <tf.Tensor: shape=(), dtype=string, numpy=b'364516'>, 'Fare': <tf.Tensor: shape=(), dtype=float64, numpy=12.475>} <class 'dict'>
and my FeatureSpace
looks like this
from keras.utils import FeatureSpace
feature_space = FeatureSpace(
# Categorical features encoded as integers
"sibsp": FeatureSpace.integer_categorical(num_oov_indices=0),
"parch": FeatureSpace.integer_categorical(num_oov_indices=0),
"pclass": FeatureSpace.integer_categorical(num_oov_indices=0),
# Categorical feature encoded as string
"sex": FeatureSpace.string_categorical(num_oov_indices=0),
"embarked": FeatureSpace.string_categorical(num_oov_indices=0),
# Numerical features to discretize
"Age": FeatureSpace.float_discretized(num_bins=30),
# Numerical features to normalize
"fare": FeatureSpace.float_normalized(),
Here's the notebook I've been working on and here's the dataset and further info about it. I'm very fresh to this but have been stuck for several days now. I've tried everything I can think of and searched all over online. Such a weird error, I'm sure I'm missing something obvious but I can't see it. Are there better ways of going about this?