gpt4 book ai didi

python - 从 Python 列表中评估和删除重复的字典

转载 作者:行者123 更新时间:2023-11-28 17:40:39 26 4
gpt4 key购买 nike

业务问题:我有一个代表给定学生学术历史的字典列表……他们上过的课,类的时间,他们的成绩是多少(空白表示类(class)在- progress), 等等。我需要在给定的类中找到任何重复的尝试,并只保留最高等级的尝试。

到目前为止我尝试过的:

acad_hist = [{‘crse_id’: u'GRG 302P0', ‘grade’: u’’}, {‘crse_id’: u’URB 3010', ‘grade’: u’B+‘},
{‘crse_id’: u'GRG 302P0', ‘grade’: u’D‘}]

grade_list = ['CR', 'D-', 'D', 'D+', 'C-', 'C', 'C+', 'B-', 'B', 'B+', 'A-', 'A', 'A+']
  1. 起初,我尝试遍历 acad_hist 列表,并将任何尚未看到的类添加到“已看到”列表中。当时的计划是,当我遇到一个已经添加到“看到”列表中的类(class)时,我应该回到 acad_hist 列表,获取该类(class)的详细信息(例如“成绩”),评估成绩,并从 acad_hist 列表中删除等级较低的类(class)。问题是,我很难轻松返回并从“已看到”列表中“抓取”较早看到的类,一旦我知道我需要将其从 acad_hist 列表中删除,就更难以正确指向它。代码很乱,但这是我目前所拥有的:

    key = ‘crse_id’
    for index, course in enumerate(acad_hist[:]):
    if course[key] not in seen:
    seen.append(course[key])
    else:
    logger.info('found duplicate {0} at index {1}'.format(course[key], index))
    < not sure what to do here… >

    输出:

    found duplicate GRG 302P0 at index 11
  2. 然后我想我也许可以使用 set() 函数为我剔除列表,但这里的问题是我需要选择要保留的类实例,而 set() 不需要似乎让我有办法做到这一点。

    names = set(d['compressed_hist_crse_id'] for d in acad_hist_condensed)
    logger.info('TEST names: {0}'.format(names))

    输出:

    TEST names: set([u'GRG 302P0', u'URB 3010’}]
  3. 想看看我是否可以添加到上面的 #2,我想我会做一些“belt-n-suspenders”循环遍历 set()“names”的输出并收集成绩。它在工作,但我不会假装完全理解它在做什么,也不会真正让我进行我需要做的处理。

    new_dicts = []
    for name in names:
    d = dict(name=name)
    d['grade'] = max(d['grade'] for d in acad_hist if d['crse_id'] == name)
    new_dicts.append(d)
    logger.info('TEST new_dicts: {0}'.format(new_dicts))

    输出:

    TEST new_dicts: [{'grade': u'', 'name': u'GRG 302P0'}, {'grade': u’B’+, 'name': u'URB 3010'}]

任何人都可以为我提供缺失的部分,或者更好的方法吗?

更新——我最终得到的解决方案(改编我从接受的答案中得到的想法)

def scrub_for_duplicate_courses(acad_hist_condensed, acad_hist_list):
"""
Looks for duplicate courses that may have been taken, and if any are found, will look for the one with the highest
grade and keep that one, deleting the other course from the lists before returning them.
"""

# -------------------------------------------
# set logging params
# -------------------------------------------
logger = logging.getLogger(__name__)

# -----------------------------------------------------------------------------------------------------
# the grade_list is in order of ascending priority/value...a blank grade indicates "in-progress", and
# will therefore replace any class instance that has a grade.
# -----------------------------------------------------------------------------------------------------
grade_list = ['CR', 'D-', 'D', 'D+', 'C-', 'C', 'C+', 'B-', 'B', 'B+', 'A-', 'A', 'A+', '']
# converting the grade_list in to a more efficient, weighted dict
grade_list = dict(zip(grade_list, range(len(grade_list))))

seen_courses = {}

for course in acad_hist_condensed[:]:
# -----------------------------------------------------------------------------------------------------
# one of the two keys checked for below should exist in the list, but not both
# -----------------------------------------------------------------------------------------------------
key = ''
if 'compressed_hist_crse_id' in course:
key = 'compressed_hist_crse_id'
elif 'compressed_ovrd_crse_id' in course:
key = 'compressed_ovrd_crse_id'

cid = course[key]
grade = course['grade']

if cid not in seen_courses:
seen_courses[cid] = grade
else:
# ---------------------------------------------------------------------------------------------------------
# if we get here, a duplicate course_id has been found in the acad_hist_condensed list, so now we'll want
# to determine which one has the lowest grade, and remove that course instance from both lists.
# ---------------------------------------------------------------------------------------------------------
if grade_list.get(seen_courses[cid], 0) < grade_list.get(grade, 0):
seen_courses[cid] = grade # this will overlay the grade for the record already in seen_courses
grade_for_rec_to_remove = seen_courses[cid]
crse_id_for_rec_to_remove = cid
else:
grade_for_rec_to_remove = grade
crse_id_for_rec_to_remove = cid

# -----------------------------------------------------------------------------------------------------
# find the rec in acad_hist_condensed that needs removal
# -----------------------------------------------------------------------------------------------------
for rec in acad_hist_condensed:
if rec[key] == crse_id_for_rec_to_remove and rec['grade'] == grade_for_rec_to_remove:
acad_hist_condensed.remove(rec)
for rec in acad_hist_list:
if rec == crse_id_for_rec_to_remove:
acad_hist_list.remove(rec)
break # just want to remove one occurrence

return acad_hist_condensed, acad_hist_list

最佳答案

一个简单的解决方案是遍历每个学生的类(class)历史并计算每门类(class)的最高成绩......

acad_hist = [{'crse_id': u'GRG 302P0', 'grade': u''}, {'crse_id': u'URB 3010', 'grade': u'B+'}, {'crse_id': u'GRG 302P0', 'grade': u'D'}]

grade_list = ['CR', 'D-', 'D', 'D+', 'C-', 'C', 'C+', 'B-', 'B', 'B+', 'A-', 'A', 'A+']
#let's turn grade_list into something more efficient:
grade_list = dict(zip(grade_list, range(len(grade_list)))) # 'CR' == 0, 'D-' == 1

courses = {} # keys will be crse_id, values will be grade.
for course in acad_hist:
cid = course['crse_id']
g = course['grade']
if cid not in courses:
courses[cid] = g
else:
if grade_list.get(courses[cid], 0) < grade_list.get(g,0):
courses[cid] = g

输出将是:

{u'GRG 302P0': u'D', u'URB 3010': u'B+'}

如果需要可以重写回原来的形式

关于python - 从 Python 列表中评估和删除重复的字典,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24888770/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com