gpt4 book ai didi

python - 将 pandas read_csv 调用传递到另一个函数及其命名参数

转载 作者:行者123 更新时间:2023-12-01 00:48:47 24 4
gpt4 key购买 nike

因为函数是 first-class citizens在Python中我应该能够重构它:

def get_events():
csv_path = os.path.join(INPUT_CSV_PATH, DATA_NAME + '.csv')
print(f'Starting reading events at {datetime.now()}')
start_time = datetime.now()
events = pd.read_csv(csv_path, dtype=DTYPES)
end_time = datetime.now()
print(f'Finished reading events at {end_time} ({end_time - start_time})')
return events

对于这样的事情:

def get_events():
csv_path = os.path.join(INPUT_CSV_PATH, DATA_NAME + '.csv')
events = _time_function_call('reading events', pd.read_csv, {'filepath_or_buffer': csv_path, 'dtype': DTYPES})
return events

def _time_function_call(message, func, *kwargs):
print(f'Starting {message} at {datetime.now()}')
start_time = datetime.now()
result = func(*kwargs)
end_time = datetime.now()
print(f'Finished {message} at {end_time} ({end_time - start_time})')
return result

即传递 Pandas read_csv函数及其命名参数转换为辅助函数。 (注意:我不确定在传递函数时如何传递命名参数,this answer 有帮助。)

但是重构后出现以下错误:

ValueError: Invalid file path or buffer object type: <class 'dict'>

关于如何将函数及其命名参数传递到另一个 Python 函数进行评估,我缺少什么?

最佳答案

你可能想重构为:

def get_events():
csv_path = os.path.join(INPUT_CSV_PATH, DATA_NAME + '.csv')
events = _time_function_call('reading events', pd.read_csv, filepath_or_buffer=csv_path, dtype=DTYPES)
return events

def _time_function_call(message, func, *args, **kwds):
start_time = datetime.now()
print(f'Starting {message} at {start_time}')
result = func(*args, **kwds)
end_time = datetime.now()
duration = end_time - start_time
print(f'Finished {message} at {end_time} ({duration})')
return result

这样Python就可以处理handling arbitrary argument lists .

我建议使用context managerslogging module因为这样的代码更容易很好地组合,例如:

from time import perf_counter
import logging

logger = logging.getLogger(__name__)

class log_timer:
def __init__(self, message):
self.message = message

def __enter__(self):
logger.info(f"{self.message} started")
# call to perf_counter() should be the last statement in method
self.start_time = perf_counter()

def __exit__(self, exc_type, exc_value, traceback):
# perf_counter() call should be first statement
secs = perf_counter() - self.start_time
state = 'finished' if exc_value is None else 'failed'
logger.info(f"{self.message} {state} after {secs * 1000:.2f}ms")

可以这样使用:

from time import sleep

logging.basicConfig(
format='%(asctime)s %(levelname)s %(message)s',
level=0,
)

with log_timer("sleep"):
sleep(1)

这样您就不必担心将任意代码位放入函数以及它们之间的线程状态。

此外,像以前一样使用 datetime 并不适合测量小段代码的运行时间,time 模块提供了 perf_counter交给更合适的操作系统/CPU(更高分辨率)定时器。

关于python - 将 pandas read_csv 调用传递到另一个函数及其命名参数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56734480/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com