gpt4 book ai didi

python - 列表列表 : replacing and adding up items of sublists

转载 作者:行者123 更新时间:2023-12-01 03:38:34 25 4
gpt4 key购买 nike

我有一个列表列表,让我们这样说:

tripInfo_csv = [['1','2',6,2], ['a','h',4,2], ['1','4',6,1], ['1','8',18,3], ['a','8',2,1]]

将子列表视为行程:[起点、终点、成人人数、 child 人数]

我的目标是获得一个列表,其中起点和终点重合的行程将其第三个和第四个值相加。开始值和结束值应始终是从 1 到 8 的数字。如果它们是字母,则应将其替换为相应的数字(a=1、b=2 等)。

这是我的代码。它有效,但我确信它可以改进。对我来说主要问题是性能。我有很多这样的列表,还有更多的子列表。

dicPoints = {'a':'1','b':'2','c':'3', 'd':'4', 'e':'5', 'f':'6', 'g':'7', 'h':'8'}
def getTrips (trips):
okTrips = []
for trip in trips:
if not trip[0].isdigit():
trip[0] = dicPoints[trip[0]]
if not trip[1].isdigit():
trip[1] = dicPoints[trip[1]]

if len(okTrips) == 0:
okTrips.append(trip)
else:
for i, stop in enumerate(okTrips):
if stop[0] == trip[0] and stop[1] == trip[1]:
stop[2] += trip[2]
stop[3] += trip[3]
break
else:
if i == len(okTrips)-1:
okTrips.append(trip)

正如eguaio提到的,上面的代码有一个错误。应该是这样的:

def getTrips (trips):
okTrips = []
print datetime.datetime.now()
for trip in trips:
if not trip[0].isdigit():
trip[0] = dicPoints[trip[0]]
if not trip[1].isdigit():
trip[1] = dicPoints[trip[1]]

if len(okTrips) == 0:
okTrips.append(trip)
else:
flag = 0
for i, stop in enumerate(okTrips):
if stop[0] == trip[0] and stop[1] == trip[1]:
stop[2] += trip[2]
stop[3] += trip[3]
flag = 1
break

if flag == 0:
okTrips.append(trip)
<小时/>

由于我想分享的 eguaio 的答案,我得到了一个改进的版本。这是我根据他的回答编写的脚本。我的数据和要求现在比我最初被告知的更加复杂,因此我做了一些更改。

CSV 文件如下所示:

LineT;Line;Route;Day;Start_point;End_point;Adults;Children;First_visit
SM55;5055;3;Weekend;15;87;21;4;0
SM02;5002;8;Weekend;AF3;89;5;0;1
...

脚本:

import os, csv, psycopg2

folder = "F:/route_project/routes"

# Day type
dicDay = {'Weekday':1,'Weekend':2,'Holiday':3}

# Dictionary with the start and end points of each route
# built from a Postgresql table (with coumns: line_route, start, end)
conn = psycopg2.connect (database="test", user="test", password="test", host="###.###.#.##")
cur = conn.cursor()
cur.execute('select id_linroute, start_p, end_p from route_ends')
recs = cur.fetchall()
dicPoints = {rec[0]: rec[1:] for rec in recs}

# When point labels are text, replace them with a number label in dicPoints
# Text is not important: they are special text labels for start and end
# of routes (for athletes), so we replace them with labels for start or
# the end of each route
def convert_point(line, route, point, i):
if point.isdigit():
return point
else:
return dicPoints["%s_%s" % (line,route)][i]

# Points with text labels mean athletes made the whole or part of this route,
# we keep them as adults but also keep this number as an extra value
# for further purposes
def num_athletes(start_p, end_p, adults):
if not start_p.isdigit() or not end_p.isdigit():
return adults
else:
return 0

# Data is taken for CSV files in subfolders
for root, dirs, files in os.walk(folder):
for file in files:
if file.endswith(".csv"):
file_path = (os.path.join(root, file))
with open(file_path, 'rb') as csvfile:
rows = csv.reader(csvfile, delimiter=';', quotechar='"')
# Skips the CSV header row
rows.next()
# linT is not used, yet it's found in every CSV file
# There's an unused last column in every file, I take advantage out of it
# to store the number of athletes in the generator
gen =((lin, route, dicDay[tday], convert_point(lin,route,s_point,0), convert_point(lin,route,e_point,1), adults, children, num_athletes(s_point,e_point,adults)) for linT, lin, route, tday, s_point, e_point, adults, children, athletes in rows)
dicCSV = {}
for lin, route, tday, s_point, e_point, adults, children, athletes in gen:
visitors = dicCSV.get(("%s_%s_%s" % (lin,route,s_point), "%s_%s_%s" % (lin,route,e_point), tday), (0, 0, 0))
dicCSV[("%s_%s_%s" % (lin,route,s_point), "%s_%s_%s" % (lin,route,e_point), tday)] = (visitors[0] + int(adults), visitors[1] + int(children), visitors[2] + int(athletes))

for k,v in dicCSV.iteritems():
print k, v

最佳答案

对于具有大量合并的大型列表,以下给出的时间比您的要好得多:tripInfo_csv*500000 为 2 秒与 1 分钟。我们使用字典来获取具有恒定查找时间的键,从而获得几乎线性的复杂性。恕我直言,它也更优雅。请注意,tg 是一个生成器,因此创建时不会占用大量时间或内存。

def newGetTrips(trips):

def convert(l):
return l if l.isdigit() else dicPoints[l]

tg = ((convert(a), convert(b), c, d) for a, b, c, d in trips)
okt = {}
for a, b, c, d in tg:
# a trick to get (0,0) as default if (a,b) is not a key of the dictionary yet
t = okt.get((a,b), (0,0))
okt[(a,b)] = (t[0] + c, t[1] + d)
return [[a,b,c,d] for (a,b), (c,d) in okt.iteritems()]

此外,作为副作用,您正在更改行程列表,而此功能不会对其造成影响。另外,你还有一个错误。您将对每个(开始,结束)对考虑的第一项求和两次(但不是第一种情况)。我找不到原因,但是当使用您的 getTrips 运行示例时,我得到:

[['1', '2', 6, 2], ['1', '8', 28, 8], ['1', '4', 12, 2]]

使用newGetTrips我得到:

[['1', '8', 24, 6], ['1', '2', 6, 2], ['1', '4', 6, 1]]

关于python - 列表列表 : replacing and adding up items of sublists,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40020663/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com