gpt4 book ai didi

python - 带有 tqdm 的 Pandas to_csv 进度条

转载 作者:行者123 更新时间:2023-12-04 11:57:02 27 4
gpt4 key购买 nike

正如标题所暗示的,我试图在执行 pandas.to_csv 时显示进度条.
我有以下脚本:

def filter_pileup(pileup, output, lists):
tqdm.pandas(desc='Reading, filtering, exporting', bar_format=BAR_DEFAULT_VIEW)
# Reading files
pileup_df = pd.read_csv(pileup, '\t', header=None).progress_apply(lambda x: x)
lists_df = pd.read_csv(lists, '\t', header=None).progress_apply(lambda x: x)
# Filtering pileup
intersection = pd.merge(pileup_df, lists_df, on=[0, 1]).progress_apply(lambda x: x)
intersection.columns = [i for i in range(len(intersection.columns))]
intersection = intersection.loc[:, 0:5]
# Exporting filtered pileup
intersection.to_csv(output, header=None, index=None, sep='\t')
在前几行中,我找到了一种集成进度条的方法,但此方法对最后一行不起作用,我该如何实现?

最佳答案

您可以将数据帧分成 n 的块行并将数据帧逐块保存到 csv 块中,第一行使用 mode='w',其余使用 mode="a":
例子:

import numpy as np
import pandas as pd
from tqdm import tqdm

df = pd.DataFrame(data=[i for i in range(0, 10000000)], columns = ["integer"])

print(df.head(10))

chunks = np.array_split(df.index, 100) # chunks of 100 rows

for chunck, subset in enumerate(tqdm(chunks)):
if chunck == 0: # first row
df.loc[subset].to_csv('data.csv', mode='w', index=True)
else:
df.loc[subset].to_csv('data.csv', header=None, mode='a', index=True)
输出:
   integer
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9

100%|██████████| 100/100 [00:12<00:00, 8.12it/s]

关于python - 带有 tqdm 的 Pandas to_csv 进度条,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64695352/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com