gpt4 book ai didi

Python: Pandas Dataframe -- Convert String Time Column in mm:ss Format to Total Minutes in Float Format(Python:Pandas Dataframe--将mm:ss格式的字符串时间列转换为浮点格式的总分钟数)

转载 作者:bug小助手 更新时间:2023-10-25 21:27:58 29 4
gpt4 key购买 nike



Let's say I have a python dataframe with a time related column called "Time". Inside this column there are strings that represent minutes and seconds. For example, the first row value 125:19 represents 125 minutes and 19 seconds. Its datatype is a string.

假设我有一个具有与时间相关的列“time”的Python DataFrame。在该列中有表示分钟和秒的字符串。例如,第一行值125:19表示125分19秒。它的数据类型是一个字符串。


I want to convert this value to total minutes in a new column "Time_minutes". So 125:19 should become 125.316666666667 which should be a float datatype.

我想在一个新的列“Time_Minents”中将该值转换为总分钟数。因此,125:19应该变成125.316666666667,它应该是浮点数据类型。


Along a similar vein if the value is 0:00 then the corresponding "Time_minutes" column should show 0 (float datatype).

同样,如果值为0:00,则相应的“time_minins”列应该显示0(浮点型数据类型)。


I've done this in SQL using lambdas and index functions. But is there an easier/more straightforward way to do this in python?

我已经使用lambdas和索引函数在SQL中做到了这一点。但是,有没有一种更简单/更直接的方法来实现这一点呢?


更多回答
优秀答案推荐

One of possible solution, use .str.split:

一种可能的解决方案是使用.str.Split:


df["Converted"] = (s := df["Time"].str.split(":")).str[0].astype(float) + (s.str[1].astype(float) / 60)
print(df)

Prints:

打印:


     Time   Converted
0 125:19 125.316667
1 0:00 0.000000
2 0:30 0.500000


Option 1

选项1


If performance is a concern and you are certain that each string ends with ":ss", you can slice Series.str with [:-3] and [-2:] respectively, apply Series.astype for conversion to float and chain Series.div for the second instance for division by 60.

如果性能是个问题,并且您确定每个字符串都以“:ss”结尾,则可以分别使用[:-3]和[-2:]对Series.str进行切片,将Series.astype应用于Float,并为第二个实例应用Chain Series.div以除以60。


import pandas as pd

data = {'Time': ['123:19','0:00','0:30']}
df = pd.DataFrame(data)

df['Time_minutes'] = (df['Time'].str[:-3].astype(float) +
df['Time'].str[-2:].astype(float).div(60))

df
Time Time_minutes
0 123:19 123.316667
1 0:00 0.000000
2 0:30 0.500000

This will be faster than any option with Series.split.

这将比使用Series.Split的任何选项都要快。


Option 2

备选案文2


Alternatively, relying on Series.split, you can set the expand parameter to True, which will return the result as a pd.DataFrame. Now, you can divide by [1, 60], leaving the first column (i.e., the integer (or "minutes") part) unchanged through division by 1, and then apply df.sum on axis=1.

或者,根据Series.Split,您可以将Expand参数设置为True,这将以pd.DataFrame形式返回结果。现在,您可以除以[1,60],通过除以1来保持第一列(即,整数(或“分钟”)部分)不变,然后在轴上应用df.sum=1。


df['Time_minutes'] = (df['Time'].str.split(':', expand=True)
.astype(float).div([1, 60]).sum(axis=1))

Option 3

备选方案3


A slightly faster variation on "Option 2" would be to apply df.pipe to the result of Series.split with expand=True and work with its column 0 and 1 inside a lambda function.

“选项2”的一个稍微快一点的变化是,将df.tube应用于Series.Split的结果,并使用Expand=True,并在lambda函数中使用它的列0和1。


df['Time_minutes'] = (df['Time'].str.split(':', expand=True)
.pipe(lambda x: x[0].astype(float) +
x[1].astype(float).div(60)))

In both cases you would benefit from avoiding the need to create an intermediate variable, such as s in the answer by @AndrejKesely. Both options are also marginally faster.

在这两种情况下,您都将受益于避免创建中间变量的需要,例如@AndrejKesely在答案中的S。这两种选择也都略快一些。


Performance comparison

性能比较


import timeit

mysetup = """
import pandas as pd
import numpy as np

np.random.seed(1)

data = {'Time': (np.random.rand(1_000)*100).round(2)}
df = pd.DataFrame(data)
df['Time'] = (df['Time'].apply(lambda x: "{:.2f}".format(x))
.str.replace('.',':', regex=False))
"""

func_dict = {'Option 1 (slice)': "df['Time'].str[:-3].astype(float) + df['Time'].str[-2:].astype(float).div(60)",
'Option 2 (expand)': "df['Time'].str.split(':', expand=True).astype(float).div([1, 60]).sum(axis=1)",
'Option 3 (expand-pipe)': "df['Time'].str.split(':', expand=True).pipe(lambda x: x[0].astype(float) + x[1].astype(float).div(60))",
'Option 4 (intermediate var)': '(s := df["Time"].str.split(":")).str[0].astype(float) + (s.str[1].astype(float) / 60)'}

for k, v in func_dict.items():
print(f"{k}: {timeit.timeit(setup=mysetup, stmt=v, number=1_000)}")

# in seconds
Option 1 (slice): 1.1033934000879526
Option 2 (expand): 1.5235498000402004
Option 3 (expand-pipe): 1.456193899968639
Option 4 (intermediate var): 1.8184985001571476

更多回答

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com