这是数据帧的一部分。正如您所看到的,时间索引中有一些整数。它不应该是时间戳。所以我想删除它。那么我们如何删除以整数作为时间索引的记录呢?
rent_time rent_price_per_square_meter
0 2016-11-28 09:01:58 0.400000
1 2016-11-28 09:02:35 0.400000
2 2016-11-28 09:02:43 0.400000
3 2016-11-28 09:03:21 0.400000
4 2016-11-28 09:03:21 0.400000
5 2016-11-28 09:03:34 0.400000
6 2016-11-28 09:03:34 0.400000
7 2017-06-17 02:49:33 0.933333
8 2017-03-19 01:30:03 0.490196
9 2017-03-10 06:39:03 11.111111
10 2017-03-09 14:40:03 16.666667
11 908797 11.000000
12 2017-06-08 03:27:52 22.000000
13 2017-06-30 03:03:11 22.000000
14 2017-02-20 11:04:48 12.000000
15 2017-03-05 13:53:39 6.842105
16 2017-03-06 14:00:01 6.842105
17 2017-03-15 02:38:54 20.000000
18 2017-03-15 02:19:07 13.043478
19 2017-02-24 15:10:00 25.000000
20 2017-06-26 02:17:31 13.043478
21 82368 11.111111
22 2017-06-30 07:53:55 4.109589
23 2017-07-17 02:42:43 20.000000
24 2017-06-30 07:38:00 5.254237
25 2017-06-30 07:49:00 4.920635
26 2017-06-30 05:26:26 4.189189
您可以使用boolean indexing
与 to_datetime
参数 errors=coerce
表示没有 datetime
值时返回 NaN
,然后添加 notnull
返回所有日期时间
:
df1 = df[pd.to_datetime(df['rent_time'], errors='coerce').notnull()]
print (df1)
rent_time rent_price_per_square_meter
0 2016-11-28 09:01:58 0.400000
1 2016-11-28 09:02:35 0.400000
2 2016-11-28 09:02:43 0.400000
3 2016-11-28 09:03:21 0.400000
4 2016-11-28 09:03:21 0.400000
5 2016-11-28 09:03:34 0.400000
6 2016-11-28 09:03:34 0.400000
7 2017-06-17 02:49:33 0.933333
8 2017-03-19 01:30:03 0.490196
9 2017-03-10 06:39:03 11.111111
10 2017-03-09 14:40:03 16.666667
12 2017-06-08 03:27:52 22.000000
13 2017-06-30 03:03:11 22.000000
14 2017-02-20 11:04:48 12.000000
15 2017-03-05 13:53:39 6.842105
16 2017-03-06 14:00:01 6.842105
17 2017-03-15 02:38:54 20.000000
18 2017-03-15 02:19:07 13.043478
19 2017-02-24 15:10:00 25.000000
20 2017-06-26 02:17:31 13.043478
22 2017-06-30 07:53:55 4.109589
23 2017-07-17 02:42:43 20.000000
24 2017-06-30 07:38:00 5.254237
25 2017-06-30 07:49:00 4.920635
26 2017-06-30 05:26:26 4.189189
编辑:
如果需要的话,进行下一步数据处理DatetimeIndex
:
df['rent_time'] = pd.to_datetime(df['rent_time'], errors='coerce')
df = df.dropna(subset=['rent_time']).set_index('rent_time')
print (df)
rent_price_per_square_meter
rent_time
2016-11-28 09:01:58 0.400000
2016-11-28 09:02:35 0.400000
2016-11-28 09:02:43 0.400000
2016-11-28 09:03:21 0.400000
2016-11-28 09:03:21 0.400000
2016-11-28 09:03:34 0.400000
2016-11-28 09:03:34 0.400000
2017-06-17 02:49:33 0.933333
2017-03-19 01:30:03 0.490196
2017-03-10 06:39:03 11.111111
2017-03-09 14:40:03 16.666667
2017-06-08 03:27:52 22.000000
2017-06-30 03:03:11 22.000000
2017-02-20 11:04:48 12.000000
2017-03-05 13:53:39 6.842105
2017-03-06 14:00:01 6.842105
2017-03-15 02:38:54 20.000000
2017-03-15 02:19:07 13.043478
2017-02-24 15:10:00 25.000000
2017-06-26 02:17:31 13.043478
2017-06-30 07:53:55 4.109589
2017-07-17 02:42:43 20.000000
2017-06-30 07:38:00 5.254237
2017-06-30 07:49:00 4.920635
2017-06-30 05:26:26 4.189189
我是一名优秀的程序员,十分优秀!