gpt4 book ai didi

python - 将 Pandas 时间戳与 scikit-learn 结合使用

转载 作者:太空狗 更新时间:2023-10-30 01:39:00 32 4
gpt4 key购买 nike

sklearn 分类器接受 pandas 的 TimeStamp (=datetime64[ns]) 作为 X 中的一列,只要 所有 X 列属于那种类型。但是当同时存在 TimeStampfloat 列时,sklearn 拒绝使用 TimeStamp。

除了使用 astype(int) 将 TimeStamp 转换为 int 之外,还有其他解决方法吗? (我仍然需要原始列来访问 dt.year 等,因此理想情况下不希望创建重复的列只是为了向 sklearn 提供功能。)

import pandas as pd
from sklearn.linear_model import LinearRegression
test = pd.date_range('20000101', periods = 100)
test_df = pd.DataFrame({'date': test})
test_df['a'] = 1
test_df['y'] = 1
lr = LinearRegression()
lr.fit(test_df[['date']], test_df['y']) # works fine
lr.fit(test_df[['date', 'date']], test_df['y']) # works fine
lr.fit(test_df[['date', 'a']], test_df['y']) # complains

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-90-0605fa5bcdfa> in <module>()
----> 1 lr.fit(test_df[['date', 'a']], test_df['y'])

/home/shoya/.pyenv/versions/3.5.0/envs/study-env/lib/python3.5/site-packages/sklearn/linear_model/base.py in fit(self, X, y, sample_weight)
434 n_jobs_ = self.n_jobs
435 X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'],
--> 436 y_numeric=True, multi_output=True)
437
438 if ((sample_weight is not None) and np.atleast_1d(

/home/shoya/.pyenv/versions/3.5.0/envs/study-env/lib/python3.5/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
521 X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
522 ensure_2d, allow_nd, ensure_min_samples,
--> 523 ensure_min_features, warn_on_dtype, estimator)
524 if multi_output:
525 y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

/home/shoya/.pyenv/versions/3.5.0/envs/study-env/lib/python3.5/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
402 # make sure we acually converted to numeric:
403 if dtype_numeric and array.dtype.kind == "O":
--> 404 array = array.astype(np.float64)
405 if not allow_nd and array.ndim >= 3:
406 raise ValueError("Found array with dim %d. %s expected <= 2."

TypeError: float() argument must be a string or a number, not 'Timestamp'

显然,当数据类型混合时,因此 ndarray 的类型为 object,sklearn 尝试将它们转换为 float,但失败并返回 TimeStamp。但是当数据类型都是 datetime64[ns] 时,sklearn 只是保持不变。

最佳答案

您可以将其转换为适当的整数或 float

test_df['date'] = test_df['date'].astype(int)

关于python - 将 Pandas 时间戳与 scikit-learn 结合使用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35439723/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com