gpt4 book ai didi

python - 为数百万行优化 python 循环

转载 作者:太空宇宙 更新时间:2023-11-03 15:33:28 25 4
gpt4 key购买 nike

我正在尝试使用 Python-Faker 模拟测试数据集。目标是为我的用例提供几百万条记录。以下是我用来为 100 万条记录填充 5 个数据元素的代码。

for i in range(500000):
df = df.append(
{'COL1': fake.first_name_female(),
'COL2': fake.last_name_female(),
'COL3': 'F',
'COL4': fake.street_address(),
'COL5': fake.zipcode_in_state()
}, ignore_index=True)
df = df.append(
{'COL1': fake.first_name_male(),
'COL2': fake.last_name_male(),
'COL3': 'M',
'COL4': fake.street_address(),
'COL5': fake.zipcode_in_state()
}, ignore_index=True)

运行这个花了将近 8 个小时。我如何优化此循环以使其运行得更快?

最佳答案

import pandas as pd
from time import time
from faker import Faker
fake = Faker()

def fake_row(i):
if i % 2 == 0:
row = [fake.first_name_female(), fake.last_name_female(), 'F', fake.street_address(), fake.zipcode_in_state()]
else:
row = [fake.first_name_male(), fake.last_name_male(), 'M', fake.street_address(), fake.zipcode_in_state()]
return row

start = time()
fake_data = [fake_row(i) for i in range(500000)]
df = pd.DataFrame(fake_data, columns=['COL1', 'COL2', 'COL3', 'COL4', 'COL5'])
print('[TIME]', time() - start)
[TIME] 171.82 secs

需要更快的代码?使用deco

import pandas as pd
from time import time
from faker import Faker
from deco import concurrent, synchronized
fake = Faker()

@concurrent
def fake_row(i):
if i % 2 == 0:
row = [fake.first_name_female(), fake.last_name_female(), 'F', fake.street_address(), fake.zipcode_in_state()]
return row
else:
row = [fake.first_name_male(), fake.last_name_male(), 'M', fake.street_address(), fake.zipcode_in_state()]
return row

@synchronized
def run(size):
res = []
for i in range(size):
res.append(fake_row(i))
return pd.DataFrame(res, columns=['COL1', 'COL2', 'COL3', 'COL4', 'COL5'])

start = time()
df = run(500000)
print('[TIME]', time() - start)
[TIME] 88.11 secs

关于python - 为数百万行优化 python 循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56429752/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com