gpt4 book ai didi

python - 将 Python 列表转换为柱状数据

转载 作者:太空宇宙 更新时间:2023-11-04 06:06:04 25 4
gpt4 key购买 nike

我有一个已抓取的字符串列表,我想将这些字符串分组,然后将其 reshape 为柱状数据。但是,并非每个组都存在变量标题。

我的列表名为 complist,如下所示:

[u'Intake Received Date:',
u'9/11/2012',
u'Intake ID:',
u'CA00325127',
u'Allegation Category:',
u'Infection Control',
u'Investigation Finding:',
u'Substantiated',
u'Intake Received Date:',
u'5/14/2012',
u'Intake ID:',
u'CA00310421',
u'Allegation Category:',
u'Quality of Care/Treatment',
u'Investigation Finding:',
u'Substantiated',
u'Intake Received Date:',
u'8/15/2011',
u'Intake ID:',
u'CA00279396',
u'Allegation Category:',
u'Quality of Care/Treatment',
u'Sub Categories:',
u'Screening',
u'Investigation Finding:',
u'Unsubstantiated',]

我的目标是让它看起来像这样:

'Intake Received Date', 'Intake ID', 'Allegation Category', 'Sub Categories', 'Investigation Finding'
'9/11/2012', 'CA00325127', 'Infection Control', '', 'Substantiated'
'5/14/2012', 'CA00310421', 'Quality of Care/Treatment', '', 'Substantiated'
'8/15/2011', 'CA00279396', 'Quality of Care/Treatment', 'Screening', 'Unsubstantiated'

我做的第一件事是根据起始元素 Intake Received Date

将列表分成 block
compgroup = []
for k, g in groupby(complist, key=lambda x:re.search(r'Intake Received Date', x)):
if not k:
compgroup.append(list(g))


#Intake Received Date was removed, so insert it back to beginning of each list:
for c in compgroup:
c.insert(0, u'Intake Received Date')


#Create list of dicts to map the preceding titles to their respective data element:
dic = []
for c in compgroup:
dic.append(dict(zip(*[iter(c)]*2)))

下一步是将字典列表转换为柱状数据,但在这一点上,我觉得我的方法过于复杂,而且我一定遗漏了一些更优雅的东西。如果有任何指导,我将不胜感激。

最佳答案

给定:

data=[u'Intake Received Date:',
u'9/11/2012',
u'Intake ID:',
u'CA00325127',
u'Allegation Category:',
u'Infection Control',
u'Investigation Finding:',
u'Substantiated',
u'Intake Received Date:',
u'5/14/2012',
u'Intake ID:',
u'CA00310421',
u'Allegation Category:',
u'Quality of Care/Treatment',
u'Investigation Finding:',
u'Substantiated',
u'Intake Received Date:',
u'8/15/2011',
u'Intake ID:',
u'CA00279396',
u'Allegation Category:',
u'Quality of Care/Treatment',
u'Sub Categories:',
u'Screening',
u'Investigation Finding:',
u'Unsubstantiated',]

你的方法其实还不错。我编辑了一下。您不需要正则表达式,也不需要重新插入 Intake Received Date

尝试:

from itertools import groupby

headers=['Intake Received Date:', 'Intake ID:', 'Allegation Category:', 'Sub Categories:', 'Investigation Finding:']
sep='Intake Received Date:'
compgroup = []
for k, g in groupby(data, key=lambda x: x==sep):
if not k:
compgroup.append([sep]+list(g))

print ', '.join(e[0:-1] for e in headers)

for di in [dict(zip(*[iter(c)]*2)) for c in compgroup]:
line=[]
for h in headers:
try:
line.append(di[h])
except KeyError:
line.append('*')
print ', '.join(line)

打印:

Intake Received Date, Intake ID, Allegation Category, Sub Categories, Investigation Finding
9/11/2012, CA00325127, Infection Control, *, Substantiated
5/14/2012, CA00310421, Quality of Care/Treatment, *, Substantiated
8/15/2011, CA00279396, Quality of Care/Treatment, Screening, Unsubstantiated

关于python - 将 Python 列表转换为柱状数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21963431/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com