gpt4 book ai didi

python - 我需要取消嵌套 JSON 数组元素并确保与 'ID' 列的正确映射

转载 作者:太空宇宙 更新时间:2023-11-04 04:44:19 28 4
gpt4 key购买 nike

输入DataFrame“df”如下(请注意'id'列中的值):

| id    | name                                                                                  |
|-------|---------------------------------------------------------------------------------------|
| a1xy | [ { "event": "sports", "start": "100"}, { "event": "lunch", "start": "121" } ] |
| a7yz | [ { "event": "lunch", "start": "109"}, { "event": "movie", "start": "97" } ] |
| bx4y | [ { "event": "dinner", "start": "78"}, { "event": "sleep", "start": "25" } ] |

我想展平 JSON 数组元素,以便我的结果输出为:

| id    | name.event | name.start |
|-------|------------|------------|
| a1xy | sports | 100 |
| a1xy | lunch | 121 |
| a7yz | lunch | 109 |
| a7yz | movie | 97 |
| bx4y | dinner | 78 |
| bx4y | sleep | 25 |

“id”列中的值需要正确映射。我如何在 Python 中执行此操作?

我试过:

k = df.name.map(json.loads).apply(pd.DataFrame).tolist()
final_df = pd.concat(k)

但我无法映射“id”列中的值。

最佳答案

您可以使用列表理解和扁平化,并通过 id 值更新每个字典,最后调用 DataFrame 构造函数:

df['name'] = df['name'].map(json.loads)

df = pd.DataFrame([dict(y, id=i) for i, x in zip(df['id'],df['name']) for y in x])
print (df)
event id start
0 sports a1xy 100
1 lunch a1xy 121
2 lunch a7yz 109
3 movie a7yz 97
4 dinner bx4y 78
5 sleep bx4y 25

但是如果输入是json,最好使用json_normalize .

时间:

df=pd.DataFrame([
['a1xy',[{ "event": "sports", "start": "100"}, { "event": "lunch", "start": "121" } ]],
['a7yz',[{ "event": "lunch", "start": "109"}, { "event": "movie", "start": "97" } ]],
['bx4y',[{ "event": "dinner", "start": "78"}, { "event": "sleep", "start": "25" } ]]],
columns=['id','name'])
print (df)

#3k rows
df = pd.concat([df] * 1000, ignore_index=True)

In [276]: %%timeit
...: pd.DataFrame([dict(y, id=i) for i, x in zip(df['id'],df['name']) for y in x])
9.49 ms ± 230 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [277]: %%timeit
...: finalArray=[]
...: df.apply(lambda x: addtoArray(x,finalArray),axis=1)
...: pd.DataFrame(finalArray,columns=['col1','event','start'])
...:
1.81 s ± 33.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

列表理解解决方案 180x 更快。​​

关于python - 我需要取消嵌套 JSON 数组元素并确保与 'ID' 列的正确映射,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49962533/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com