gpt4 book ai didi

python - Pandas - 从列中提取多个数据

转载 作者:太空宇宙 更新时间:2023-11-03 23:58:26 25 4
gpt4 key购买 nike

我的 Dataframe 中有一列包含以下格式的一些数据:

['com.atlassian.greenhopper.service.sprint.Sprint@339ba62[id=001,rapidViewId=24,state=CLOSED,name=proj_a,goal=,startDate=2015-01-01T04:00:26.231Z,endDate=2015-01-13T14:36:00.000Z,completeDate=2015-02-13T14:07:09.739Z,sequence=001]

我试图从上面的列中提取 id 值,我可以使用以下方法做到这一点:

df['id'] = df['value'].astype(str).str.split('id').str[1]
df['id'] = df['id'].str.split(',').str[0]
df['id'] = df['id'].str.split('=').str[1]

我现在遇到一个问题,其中同一个字段有多个这样的值,如下所示:

['com.atlassian.greenhopper.service.sprint.Sprint@339ba62[id=001,rapidViewId=24,
state=CLOSED,name=proj_a,goal=,startDate=2015-01-01T04:00:26.231Z,endDate=2015-01-13T14:36:00.000Z,
completeDate=2015-02-13T14:07:09.739Z,sequence=001]',
'com.atlassian.greenhopper.service.sprint.Sprint@10b316d8[id=002,rapidViewId=24,
state=CLOSED,name=proj_b,goal=,startDate=2016-01-01T04:00:26.231Z,
endDate= 2016-01-13T14:36:00.000Z,completeDate= 2016-02-13T14:07:09.739Z,sequence=002]',
'com.atlassian.greenhopper.service.sprint.Sprint@2a13ba77[id=003,
rapidViewId=24,state=CLOSED,name=proj_c,goal=,
startDate= 2017-01-01T04:00:26.231Z,endDate= 2017-01-13T14:36:00.000Z,
completeDate= 2017-02-13T14:07:09.739Z,sequence=003]',
'com.atlassian.greenhopper.service.sprint.Sprint@76d3dba0[id=004,rapidViewId=24,
state=CLOSED,name=proj_d,goal=,startDate=2018-01-01T04:00:26.231Z,
endDate= 2018-01-13T14:36:00.000Z,completeDate= 2018-02-13T14:07:09.739Z,sequence=004]', 'com.atlassian.greenhopper.service.sprint.Sprint@307a51a2[id=005,
rapidViewId=24,state=CLOSED,name=proj_e,goal=,startDate=2019-01-01T04:00:26.231Z,
endDate= 2019-01-13T14:36:00.000Z,completeDate= 2019-02-13T14:07:09.739Z,sequence=005]']

预期输出:

001,002,003,004,005

我正在尝试提取与 id 对应的值并将它们存储在单个字段中

最佳答案

使用str.findall

例如:

df = pd.DataFrame({"value": ['com.atlassian.greenhopper.service.sprint.Sprint@339ba62[id=001,rapidViewId=24,state=CLOSED,name=proj_a,goal=,startDate=2015-01-01T04:00:26.231Z,endDate=2015-01-13T14:36:00.000Z,completeDate=2015-02-13T14:07:09.739Z,sequence=001]', 'com.atlassian.greenhopper.service.sprint.Sprint@10b316d8[id=002,rapidViewId=24,state=CLOSED,name=proj_b,goal=,startDate=2016-01-01T04:00:26.231Z,endDate= 2016-01-13T14:36:00.000Z,completeDate= 2016-02-13T14:07:09.739Z,sequence=002]', 'com.atlassian.greenhopper.service.sprint.Sprint@2a13ba77[id=003,rapidViewId=24,state=CLOSED,name=proj_c,goal=,startDate= 2017-01-01T04:00:26.231Z,endDate= 2017-01-13T14:36:00.000Z,completeDate= 2017-02-13T14:07:09.739Z,sequence=003]', 'com.atlassian.greenhopper.service.sprint.Sprint@76d3dba0[id=004,rapidViewId=24,state=CLOSED,name=proj_d,goal=,startDate=2018-01-01T04:00:26.231Z,endDate= 2018-01-13T14:36:00.000Z,completeDate= 2018-02-13T14:07:09.739Z,sequence=004]', 'com.atlassian.greenhopper.service.sprint.Sprint@307a51a2[id=005,rapidViewId=24,state=CLOSED,name=proj_e,goal=,startDate=2019-01-01T04:00:26.231Z,endDate= 2019-01-13T14:36:00.000Z,completeDate= 2019-02-13T14:07:09.739Z,sequence=005]']})
df["id"] = df["value"].str.findall(r"id\=(\d+),")
print(df)

输出:

                                               value   id
0 com.atlassian.greenhopper.service.sprint.Sprin... 001
1 com.atlassian.greenhopper.service.sprint.Sprin... 002
2 com.atlassian.greenhopper.service.sprint.Sprin... 003
3 com.atlassian.greenhopper.service.sprint.Sprin... 004
4 com.atlassian.greenhopper.service.sprint.Sprin... 005

如果您的 DF 在单个列表中包含所有值,请使用。

df = pd.DataFrame({"value": [['com.atlassian.greenhopper.service.sprint.Sprint@339ba62[id=001,rapidViewId=24,state=CLOSED,name=proj_a,goal=,startDate=2015-01-01T04:00:26.231Z,endDate=2015-01-13T14:36:00.000Z,completeDate=2015-02-13T14:07:09.739Z,sequence=001]', 'com.atlassian.greenhopper.service.sprint.Sprint@10b316d8[id=002,rapidViewId=24,state=CLOSED,name=proj_b,goal=,startDate=2016-01-01T04:00:26.231Z,endDate= 2016-01-13T14:36:00.000Z,completeDate= 2016-02-13T14:07:09.739Z,sequence=002]', 'com.atlassian.greenhopper.service.sprint.Sprint@2a13ba77[id=003,rapidViewId=24,state=CLOSED,name=proj_c,goal=,startDate= 2017-01-01T04:00:26.231Z,endDate= 2017-01-13T14:36:00.000Z,completeDate= 2017-02-13T14:07:09.739Z,sequence=003]', 'com.atlassian.greenhopper.service.sprint.Sprint@76d3dba0[id=004,rapidViewId=24,state=CLOSED,name=proj_d,goal=,startDate=2018-01-01T04:00:26.231Z,endDate= 2018-01-13T14:36:00.000Z,completeDate= 2018-02-13T14:07:09.739Z,sequence=004]', 'com.atlassian.greenhopper.service.sprint.Sprint@307a51a2[id=005,rapidViewId=24,state=CLOSED,name=proj_e,goal=,startDate=2019-01-01T04:00:26.231Z,endDate= 2019-01-13T14:36:00.000Z,completeDate= 2019-02-13T14:07:09.739Z,sequence=005]']]})
df["id"] = df["value"].apply(",".join).str.findall(r"id\=(\d+),").apply(",".join)
print(df)

输出:

                                               value                   id
0 [com.atlassian.greenhopper.service.sprint.Spri... 001,002,003,004,005

关于python - Pandas - 从列中提取多个数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56654968/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com