gpt4 book ai didi

python - Pandas 向量化而不是循环

转载 作者:行者123 更新时间:2023-12-05 06:37:56 25 4
gpt4 key购买 nike

我有一个路径数据框。任务是使用诸如 datetime.fromtimestamp(os.path.getmtime('PATH_HERE')) 之类的东西将文件夹的最后修改时间放入单独的列中

import pandas as pd
import numpy as np
import os


df1 = pd.DataFrame({'Path' : ['C:\\Path1' ,'C:\\Path2', 'C:\\Path3']})

#for a MVCE use the below commented out code. WARNING!!! This WILL Create directories on your machine.
#for path in df1['Path']:
# os.mkdir(r'PUT_YOUR_PATH_HERE\\' + os.path.basename(path))

我可以用下面的方法完成这个任务,但如果我有很多文件夹,这是一个缓慢的循环:

for each_path in df1['Path']:
df1.loc[df1['Path'] == each_path, 'Last Modification Time'] = datetime.fromtimestamp(os.path.getmtime(each_path))

我将如何引导这个过程以提高速度? os.path.getmtime 无法接受系列。我正在寻找类似的东西:

df1['上次修改时间'] = datetime.fromtimestamp(os.path.getmtime(df1['Path']))

最佳答案

假设使用 100 条路径,我将介绍 3 种方法。我认为方法 3 更可取。

# Data initialisation:
paths100 = ['a_whatever_path_here'] * 100
df = pd.DataFrame(columns=['paths', 'time'])
df['paths'] = paths100


def fun1():
# Naive for loop. High readability, slow.
for path in df['paths']:
mask = df['paths'] == path
df.loc[mask, 'time'] = datetime.fromtimestamp(os.path.getmtime(path))


def fun2():
# Naive for loop optimised. Medium readability, medium speed.
for i, path in enumerate(df['paths']):
df.loc[i, 'time'] = datetime.fromtimestamp(os.path.getmtime(path))


def fun3():
# List comprehension. High readability, high speed.
df['time'] = [datetime.fromtimestamp(os.path.getmtime(path)) for path in df['paths']]


% timeit fun1()
>>> 164 ms ± 2.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

% timeit fun2()
>>> 11.6 ms ± 67.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

% timeit fun3()
>>> 13.1 ns ± 0.0327 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

关于python - Pandas 向量化而不是循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46833670/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com