I have a pandas dataframe:
我有一个熊猫数据框:
lat lng alt days date time
0 40.003834 116.321462 211 39745.175405 2008-10-24 04:12:35
1 40.003783 116.321431 201 39745.175463 2008-10-24 04:12:40
2 40.003690 116.321429 203 39745.175521 2008-10-24 04:12:45
3 40.003589 116.321427 194 39745.175579 2008-10-24 04:12:50
4 40.003522 116.321412 190 39745.175637 2008-10-24 04:12:55
5 40.003509 116.321484 188 39745.175694 2008-10-24 04:13:00
For which I am trying to convert the df['date'] and df['time'] columns into a datetime. I can do:
为此,我尝试将df[‘date’]和df[‘time’]列转换为日期时间。我可以做到:
df['Datetime'] = pd.to_datetime(df['date']+df['time'])
df = df.set_index(['Datetime'])
del df['date']
del df['time']
And I get:
我得到了:
lat lng alt days
Datetime
2008-10-2404:12:35 40.003834 116.321462 211 39745.175405
2008-10-2404:12:40 40.003783 116.321431 201 39745.175463
2008-10-2404:12:45 40.003690 116.321429 203 39745.175521
2008-10-2404:12:50 40.003589 116.321427 194 39745.175579
2008-10-2404:12:55 40.003522 116.321412 190 39745.175637
But then if I try:
但如果我试一试:
df.between_time(time(1),time(22,59,59))['lng'].std()
I get an error - 'TypeError: Index must be DatetimeIndex'
我收到错误-‘TypeError:Index必须为DatetimeIndex’
So, I've also tried setting the DatetimeIndex:
因此,我还尝试设置了DatetimeIndex:
df['Datetime'] = pd.to_datetime(df['date']+df['time'])
#df = df.set_index(['Datetime'])
df = df.set_index(pd.DatetimeIndex(df['Datetime']))
del df['date']
del df['time']
And this throws an error also - 'DateParseError: unknown string format'
这也抛出了一个错误--‘DateParseError:未知的字符串格式’
How do I create the datetime column and DatetimeIndex correctly so that df.between_time() works right?
如何正确地创建datetime列和DatetimeIndex,以便df.between_time()正常工作?
更多回答
The 'DateParseError: unknown string format' is that it cannot figure out the "2008-10-2404:12:35" format since the 'DD' and 'HH' are adjacent.
‘DateParseError:未知字符串格式’是因为‘DD’和‘HH’是相邻的,所以它无法识别“2008-10-2404:12:35”格式。
优秀答案推荐
To simplify Kirubaharan's answer a bit:
将Kirubaharan的回答简单化一点:
df['Datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])
df = df.set_index('Datetime')
And to get rid of unwanted columns (as OP did but did not specify per se in the question):
并删除不需要的列(就像OP所做的那样,但没有在问题中具体说明其本身):
df = df.drop(['date','time'], axis=1)
You are not creating datetime index properly,
您没有正确创建日期时间索引,
format = '%Y-%m-%d %H:%M:%S'
df['Datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'], format=format)
df = df.set_index(pd.DatetimeIndex(df['Datetime']))
You may also want to set inplace=True
. This way it returns the same df
您可能还希望设置inplace=True。通过这种方式,它返回相同的df
df["datetime"] = pd.to_datetime(df["date"] + " " + df["time"], format = "%Y-%m-%d %H:%M:%S")
df.set_index(["datetime"], inplace=True)
This worked best for me:
这对我来说效果最好:
format = '%Y-%m-%d%H:%M:%S'
df['Datetime'] = pd.to_datetime(df['date'] + df['time'].astype("string"), format=format)
In some cases Python treats df['date']
as column of integers.
在某些情况下,Python将df[‘date’]视为整数列。
I had trouble with setting a column formatted as YYYY-MM-DD as a date time index column in a data frame I needed for time series forecasting. This is how I solved it for a dateframe where I wanted "dateCol" to be the datetime index:
我在将YYYY-MM-DD格式的列设置为时间序列预测所需的数据框中的日期时间索引列时遇到了麻烦。这就是我如何解决日期框问题的方法,在该日期框中,我希望将“date Col”作为日期时间索引:
idx = pd.DatetimeIndex(self.df[dateCol])
self.df = self.df.set_index(idx)
Then to drop the column so it's not duplicated in the dataframe
然后删除该列,这样它就不会在数据帧中重复
self.df = self.df.drop(dateCol, axis=1)
更多回答
So the trick here is adding a space between the date and time and then the pd.to_datetime()
Does The Right Thing with the resultant strings?
所以这里的诀窍是在日期和时间之间添加一个空格,然后pd.to_Datetime()对结果字符串做正确的事情?
if inplace=True
, does it really return anything? can we simply remove the assignment operator and just use the right-hand-side?
如果inplace=True,它是否真的返回任何内容?我们可以简单地删除赋值操作符,只使用右侧吗?
@MJK In place prevents you from creating df object by performing the operation on the same df. To use the assignment on the right, you'd have to type in the operation as the first argument to df.set_index
, It is cleaner to use the assignment operator first.
@MJK in Place防止您通过在同一个DF上执行操作来创建DF对象。要使用右侧的赋值,您必须键入操作作为df.set_index的第一个参数,首先使用赋值操作符会更简洁。
我是一名优秀的程序员,十分优秀!