python - 如何从 JSON 列表创建 pandas.DataFrame-6ren

python - 如何从 JSON 列表创建 pandas.DataFrame

转载作者：太空宇宙更新时间：2023-11-03 21:21:12

我有来自 CSV 的 pandas DataFrame ( gist with small sample ):

|  title   |                       genres               |
--------------------------------------------------------
| %title1% |[{id: 1, name: '...'}, {id: 2, name: '...'}]|
| %title2% |[{id: 2, name: '...'}, {id: 4, name: '...'}]|
...
| %title9% |[{id: 3, name: '...'}, {id: 9, name: '...'}]|

每个标题可以与不同数量的流派相关联(更多或更多1)。

任务是将数组从 genre 列转换为列，并为每个流派放置一个(或 True):

|  title   | genre_1 | genre_2 | genre_3 | ... | genre_9 |
---------------------------------------------------------
| %title1% |    1    |    1    |    0    | ... |    0    |
| %title2% |    1    |    0    |    0    | ... |    0    |
...
| %title9% |    0    |    0    |    1    | ... |    1    |

流派是常量集(该集中大约有 20 个项目)。

简单的方法是:

创建所有流派的集合
为每个流派创建列，并用 0 填充
对于每一行，在 DataFrame 中检查 genres 列中是否存在某些流派，并用 1 填充该流派的列。

这种方法看起来有点奇怪。

我认为 pandas 有更合适的方法。

最佳答案

据我所知，没有办法以矢量化方式对 Pandas 数据帧执行 JSON 反序列化。您应该能够做到这一点的一种方法是使用 .iterrows()这将让您在一个循环中完成此操作(尽管比大多数内置 pandas 操作慢)。

import json

df = # ... your dataframe

for index, row in df.iterrows():
    # deserialize the JSON string
    json_data = json.loads(row['genres'])

    # add a new column for each of the genres (Pandas is okay with it being sparse)
    for genre in json_data:
        df.loc[index, genre['name']] = 1  # update the row in the df itself

df.drop(['genres'], axis=1, inplace=True)

请注意，空单元格填充为 NaN，而不是 0 - 您应该使用 .fillna()改变这一点。一个带有模糊相似数据框的简短示例看起来像

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([{'title': 'hello', 'json': '{"foo": "bar"}'}, {'title': 'world', 'json': '{"foo": "bar", "ba
   ...: z": "boo"}'}])

In [3]: df.head()
Out[3]:
                           json  title
0                {"foo": "bar"}  hello
1  {"foo": "bar", "baz": "boo"}  world

In [4]: import json
   ...: for index, row in df.iterrows():
   ...:     data = json.loads(row['json'])
   ...:     for k, v in data.items():
   ...:         df.loc[index, k] = v
   ...: df.drop(['json'], axis=1, inplace=True)

In [5]: df.head()
Out[5]:
   title  foo  baz
0  hello  bar  NaN
1  world  bar  boo

关于python - 如何从 JSON 列表创建 pandas.DataFrame，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54234033/

文章推荐： python - 如何在 vs code 中缩进 jupyter 单元格/ block

文章推荐： python - Pyspark 未记录到文件

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 如何从 JSON 列表创建 pandas.DataFrame