gpt4 book ai didi

python - 使用 apply 将从一列(json 类型)提取的值插入到另一列

转载 作者:太空宇宙 更新时间:2023-11-04 02:52:41 25 4
gpt4 key购买 nike

我有这个数据集:

userid   sub_id    event
1 NaN {'score':25, 'sub_id':5}
1 5 {'score':1}

sub_id 列为 NaN 时,我想使用以下代码从 event 列中提取此信息:

df['sub_id'] = df.apply(lambda row: 
row['event'].split('sub_id')[1]
if pd.isnull(row['sub_id'])
else row['sub_id'])

但是,我收到此错误:KeyError: ('sub_id', u'occurred at index index')

我正在尝试获取此数据框:

userid   sub_id    event
1 5 {'score':25, 'sub_id':5}
1 5 {'score':1}

对错误有任何想法,或对不同解决方案有任何建议吗?

更新

我需要提取嵌套字典元素中的值:

event
{u'POST': {u'{"options_selected":{"Ideas":"0"},"criterion_feedback":{},"overall_feedback":"Feedback_text_goes_here_1"}': [u'']}, u'GET': {}}

我正在使用这段代码:

df['POST'] = df['event'].apply(pd.Series)['POST']

创建以下列:

POST
{u'{"options_selected":{"Ideas":"0"},"criterion_feedback":{},"overall_feedback":"Feedback_text_goes_here_1"}': [u'']}

但是,我需要获取 overall_feedback 值。由于 POST 字段的格式,以下代码不起作用:

df['POST'].apply(pd.Series)['overall_feedback']

它抛出这个错误 KeyError: 'overall_feedback'

有什么想法吗?

最佳答案

您可以使用 combine_firstfillna :

print (type(df.loc[0, 'event']))
<class 'dict'>

df['sub_id'] = df['sub_id'].combine_first(df.event.apply(lambda x: x['score']))
#df['sub_id'] = df['sub_id'].fillna(df.event.apply(lambda x: x['score']))
print (df)
event sub_id userid
0 {'sub_id': 5, 'score': 5} 5.0 1
1 {'score': 1} 5.0 1

编辑:如果嵌套字典,更快的解决方案是使用双 DataFame 构造函数和较慢的解决方案双 applySeries:

df = pd.DataFrame({'userid':[1,1],
'sub_id':[np.nan, 5],
'event':[{'post':{'score':25, 'sub_id':5}},{'post':{'score':1}} ]})

print (df)
event sub_id userid
0 {'post': {'sub_id': 5, 'score': 25}} NaN 1
1 {'post': {'score': 1}} 5.0 1

s = pd.DataFrame(pd.DataFrame(df['event'].values.tolist())['post'].values.tolist())['score']
print (s)
0 25
1 1
Name: score, dtype: int64
s = df['event'].apply(pd.Series)['post'].apply(pd.Series)['score']
print (s)
0 25.0
1 1.0
Name: score, dtype: float64

df['sub_id'] = df['sub_id'].combine_first(s)
print (df)
event sub_id userid
0 {'post': {'sub_id': 5, 'score': 25}} 25.0 1
1 {'post': {'score': 1}} 5.0 1

编辑1:

要转换为 dict 可以使用:

import ast, yaml

df = pd.DataFrame({'userid':[1,1],
'sub_id':[np.nan, 5],
'event':[{'post':{'score':25, 'sub_id':5}},{'post':{'score':1}} ]})

df.event = df.event.astype(str)
print (type(df.loc[0, 'event']))
<class 'str'>

df['event'] = df['event'].apply(ast.literal_eval)
#df['event'] = df['event'].apply(yaml.load)
print (df)
event sub_id userid
0 {'post': {'sub_id': 5, 'score': 25}} NaN 1
1 {'post': {'score': 1}} 5.0 1

print (type(df.loc[0, 'event']))
<class 'dict'>

编辑2:

d = {u'{"options_selected":{"Ideas":"0"},"criterion_feedback":{},"overall_feedback":"Feedback_text_goes_here_1"}': [u'']}
d1 = {u'{"options_selected":{"Ideas":"2"},"criterion_feedback":{},"overall_feedback":"Feedback_text_goes_here_2"}': [u'']}

df = pd.DataFrame({'userid':[1,1],
'sub_id':[np.nan, 5],
'event':[d,d1]})

df['event'] = df['event'].astype(str).apply(yaml.load).apply(lambda x : yaml.load(list(x.keys())[0]))

print (type(df.event.iloc[0]))
<class 'dict'>

print (df.event.apply(pd.Series)['overall_feedback'])
0 Feedback_text_goes_here_1
1 Feedback_text_goes_here_2
Name: overall_feedback, dtype: object

关于python - 使用 apply 将从一列(json 类型)提取的值插入到另一列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43304092/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com