gpt4 book ai didi

python - 大数据集优化

转载 作者:行者123 更新时间:2023-12-01 07:35:59 27 4
gpt4 key购买 nike

我已在 here 中发布了供审核的代码。然而到目前为止,它没有收到正确的响应,我认为这是由于代码过长造成的。在此我将切入正题。假设我们有以下列表:

t0=[('Albania','Angola','Germany','UK'),('UK','France','Italy'),('Austria','Bahamas','Brazil','Chile'),('Germany','UK'),('US')]
t1=[('Angola', 'UK'), ('Germany', 'UK'), ('UK', 'France'), ('UK', 'Italy'), ('France', 'Italy'), ('Austria', 'Bahamas')]
t2=[('Angola:UK'), ('Germany:UK'), ('UK:France'), ('UK:Italy'), ('France:Italy'), ('Austria:Bahamas')]

目标是 t1 中的每一对我们通过t0如果找到该对,我们将其替换为相应的 t3元素,我们可以使用以下命令来做到这一点:

result = []
for v1, v2 in zip(t1, t2):
out = []
for i in t0:
common = set(v1).intersection(i)
if set(v1) == common:
out.append(tuple(list(set(i) - common) + [v2]))
else:
out.append(tuple(i))
result.append(out)

pprint(result, width=100)

给出:

[[('Albania', 'Germany', 'Angola:UK'),
('UK', 'France', 'Italy'),
('Austria', 'Bahamas', 'Brazil', 'Chile'),
('Germany', 'UK'),
('U', 'S')],
[('Albania', 'Angola', 'Germany:UK'),
('UK', 'France', 'Italy'),
('Austria', 'Bahamas', 'Brazil', 'Chile'),
('Germany:UK',),
('U', 'S')],
[('Albania', 'Angola', 'Germany', 'UK'),
('Italy', 'UK:France'),
('Austria', 'Bahamas', 'Brazil', 'Chile'),
('Germany', 'UK'),
('U', 'S')],
[('Albania', 'Angola', 'Germany', 'UK'),
('France', 'UK:Italy'),
('Austria', 'Bahamas', 'Brazil', 'Chile'),
('Germany', 'UK'),
('U', 'S')],
[('Albania', 'Angola', 'Germany', 'UK'),
('UK', 'France:Italy'),
('Austria', 'Bahamas', 'Brazil', 'Chile'),
('Germany', 'UK'),
('U', 'S')],
[('Albania', 'Angola', 'Germany', 'UK'),
('UK', 'France', 'Italy'),
('Brazil', 'Chile', 'Austria:Bahamas'),
('Germany', 'UK'),
('U', 'S')]]

这个列表的长度为6,这表明t1中有6个元素和t2每个子列表有 5 个元素,对应于 t0 中的元素数量。就目前情况而言,代码速度很快,但就我而言,我有 t0其中长度约为 48000,t1 的长度约为 30000。运行时间几乎要花很长时间我想知道如何用更快的方法执行相同的操作?

最佳答案

您可以使用双重列表理解。该代码的运行速度大约提高了 3.47 倍(13.3 µs 与 46.2 µs)。

t0=[('Albania','Angola','Germany','UK'),('UK','France','Italy'),('Austria','Bahamas','Brazil','Chile'),('Germany','UK'),('US')]
t1=[('Angola', 'UK'), ('Germany', 'UK'), ('UK', 'France'), ('UK', 'Italy'), ('France', 'Italy'), ('Austria', 'Bahamas')]
t2=[('Angola:UK'), ('Germany:UK'), ('UK:France'), ('UK:Italy'), ('France:Italy'), ('Austria:Bahamas')]

# We transform the lists of tuple to lists of sets for easier and faster computations
# We transform the lists of tuple to lists of sets for easier and faster computations
t0 = [set(x) for x in t0]
t1 = [set(x) for x in t1]

# We define a function that removes list of elements and adds an element
# from a set
def add_remove(set_, to_remove, to_add):
result_temp = set_.copy()
for element in to_remove:
result_temp.remove(element)
result_temp.add(to_add)
return result_temp

# We do the computation using a double list comprehension
result = [[add_remove(y, x, z) if x.issubset(y) else y for y in t0]
for x, z in zip(t1, t2)]

关于python - 大数据集优化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56988171/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com