Let's say I have a python dataframe with a time related column called "Time". Inside this column there are strings that represent minutes and seconds. For example, the first row value 125:19 represents 125 minutes and 19 seconds. Its datatype is a string.
假设我有一个具有与时间相关的列“time”的Python DataFrame。在该列中有表示分钟和秒的字符串。例如,第一行值125:19表示125分19秒。它的数据类型是一个字符串。
I want to convert this value to total minutes in a new column "Time_minutes". So 125:19 should become 125.316666666667 which should be a float datatype.
我想在一个新的列“Time_Minents”中将该值转换为总分钟数。因此,125:19应该变成125.316666666667,它应该是浮点数据类型。
Along a similar vein if the value is 0:00 then the corresponding "Time_minutes" column should show 0 (float datatype).
同样,如果值为0:00,则相应的“time_minins”列应该显示0(浮点型数据类型)。
I've done this in SQL using lambdas and index functions. But is there an easier/more straightforward way to do this in python?
我已经使用lambdas和索引函数在SQL中做到了这一点。但是,有没有一种更简单/更直接的方法来实现这一点呢?
更多回答
One of possible solution, use .str.split
:
一种可能的解决方案是使用.str.Split:
df["Converted"] = (s := df["Time"].str.split(":")).str[0].astype(float) + (s.str[1].astype(float) / 60)
print(df)
Prints:
打印:
Time Converted
0 125:19 125.316667
1 0:00 0.000000
2 0:30 0.500000
Option 1
选项1
If performance is a concern and you are certain that each string ends with ":ss"
, you can slice Series.str
with [:-3]
and [-2:]
respectively, apply Series.astype
for conversion to float
and chain Series.div
for the second instance for division by 60.
如果性能是个问题,并且您确定每个字符串都以“:ss”结尾,则可以分别使用[:-3]和[-2:]对Series.str进行切片,将Series.astype应用于Float,并为第二个实例应用Chain Series.div以除以60。
import pandas as pd
data = {'Time': ['123:19','0:00','0:30']}
df = pd.DataFrame(data)
df['Time_minutes'] = (df['Time'].str[:-3].astype(float) +
df['Time'].str[-2:].astype(float).div(60))
df
Time Time_minutes
0 123:19 123.316667
1 0:00 0.000000
2 0:30 0.500000
This will be faster than any option with Series.split
.
这将比使用Series.Split的任何选项都要快。
Option 2
备选案文2
Alternatively, relying on Series.split
, you can set the expand
parameter to True
, which will return the result as a pd.DataFrame
. Now, you can divide by [1, 60]
, leaving the first column (i.e., the integer (or "minutes") part) unchanged through division by 1, and then apply df.sum
on axis=1
.
或者,根据Series.Split,您可以将Expand参数设置为True,这将以pd.DataFrame形式返回结果。现在,您可以除以[1,60],通过除以1来保持第一列(即,整数(或“分钟”)部分)不变,然后在轴上应用df.sum=1。
df['Time_minutes'] = (df['Time'].str.split(':', expand=True)
.astype(float).div([1, 60]).sum(axis=1))
Option 3
备选方案3
A slightly faster variation on "Option 2" would be to apply df.pipe
to the result of Series.split
with expand=True
and work with its column 0
and 1
inside a lambda function.
“选项2”的一个稍微快一点的变化是,将df.tube应用于Series.Split的结果,并使用Expand=True,并在lambda函数中使用它的列0和1。
df['Time_minutes'] = (df['Time'].str.split(':', expand=True)
.pipe(lambda x: x[0].astype(float) +
x[1].astype(float).div(60)))
In both cases you would benefit from avoiding the need to create an intermediate variable, such as s
in the answer by @AndrejKesely
. Both options are also marginally faster.
在这两种情况下,您都将受益于避免创建中间变量的需要,例如@AndrejKesely在答案中的S。这两种选择也都略快一些。
Performance comparison
性能比较
import timeit
mysetup = """
import pandas as pd
import numpy as np
np.random.seed(1)
data = {'Time': (np.random.rand(1_000)*100).round(2)}
df = pd.DataFrame(data)
df['Time'] = (df['Time'].apply(lambda x: "{:.2f}".format(x))
.str.replace('.',':', regex=False))
"""
func_dict = {'Option 1 (slice)': "df['Time'].str[:-3].astype(float) + df['Time'].str[-2:].astype(float).div(60)",
'Option 2 (expand)': "df['Time'].str.split(':', expand=True).astype(float).div([1, 60]).sum(axis=1)",
'Option 3 (expand-pipe)': "df['Time'].str.split(':', expand=True).pipe(lambda x: x[0].astype(float) + x[1].astype(float).div(60))",
'Option 4 (intermediate var)': '(s := df["Time"].str.split(":")).str[0].astype(float) + (s.str[1].astype(float) / 60)'}
for k, v in func_dict.items():
print(f"{k}: {timeit.timeit(setup=mysetup, stmt=v, number=1_000)}")
# in seconds
Option 1 (slice): 1.1033934000879526
Option 2 (expand): 1.5235498000402004
Option 3 (expand-pipe): 1.456193899968639
Option 4 (intermediate var): 1.8184985001571476
更多回答
我是一名优秀的程序员,十分优秀!