gpt4 book ai didi

python - 如何将带有字典列表的 Pandas 列拆分为每个键的单独列

转载 作者:行者123 更新时间:2023-12-04 08:18:01 24 4
gpt4 key购买 nike

我正在分析 来自 Facebook 的政治广告,这是一个 dataset已发布 here , 由 P​​roPublica 提供。

这就是我的意思。我有一整列要分析的目标,但它的格式对于我的技能水平的人来说非常难以访问。

这仅来自 1 个单元格:[{"target": "NAge", "segment": "21 and older"}, {"target": "MinAge", "segment": "21"}, {"target": "Retargeting", "segment": "可能与其客户相似的人"}, {"target": "Region", "segment": "the United States"}]

还有一个:[{"target": "NAge", "segment": "18 and older"}, {"target": "Location Type", "segment": "HOME"}, {"target": "Interest ", "segment": "西类牙文化"}, {"target": "兴趣", "segment": "共和党(美国)"}, {"target": "位置粒度", "segment": "country"}, {"target": "Country", "segment": "the United States"}, {"target": "MinAge", "segment": 18}]

我需要做的是将每个“目标”项目分开,成为列标签,每个对应的“段”成为该列中的可能值。

或者,解决方案是创建一个函数来调用每行中的每个字典键来计算频率吗?

最佳答案

  • 列是 listsdicts .
    • 每个dictlist可以使用 pandas.explode() 移至单独的列.
    • 转换 dicts 的列通过使用 pandas.json_normalize() 到一个数据框,其中键是列标题,值是观察值, .join()这回到df .
  • 使用.drop()删除不需要的列。
  • 如果该列包含字符串形式的字典列表(例如 "[{key: value}]" ),请参阅此 solutionSplitting dictionary/list inside a Pandas Column into Separate Columns ,并使用:
    • df.col2 = df.col2.apply(literal_eval) , 与 from ast import literal_eval .
import pandas as pd

# create sample dataframe
df = pd.DataFrame({'col1': ['x', 'y'], 'col2': [[{"target": "NAge", "segment": "21 and older"}, {"target": "MinAge", "segment": "21"}, {"target": "Retargeting", "segment": "people who may be similar to their customers"}, {"target": "Region", "segment": "the United States"}], [{"target": "NAge", "segment": "18 and older"}, {"target": "Location Type", "segment": "HOME"}, {"target": "Interest", "segment": "Hispanic culture"}, {"target": "Interest", "segment": "Republican Party (United States)"}, {"target": "Location Granularity", "segment": "country"}, {"target": "Country", "segment": "the United States"}, {"target": "MinAge", "segment": 18}]]})

# display(df)
col1 col2
0 x [{'target': 'NAge', 'segment': '21 and older'}, {'target': 'MinAge', 'segment': '21'}, {'target': 'Retargeting', 'segment': 'people who may be similar to their customers'}, {'target': 'Region', 'segment': 'the United States'}]
1 y [{'target': 'NAge', 'segment': '18 and older'}, {'target': 'Location Type', 'segment': 'HOME'}, {'target': 'Interest', 'segment': 'Hispanic culture'}, {'target': 'Interest', 'segment': 'Republican Party (United States)'}, {'target': 'Location Granularity', 'segment': 'country'}, {'target': 'Country', 'segment': 'the United States'}, {'target': 'MinAge', 'segment': 18}]

# use explode to give each dict in a list a separate row
df = df.explode('col2', ignore_index=True)

# normalize the column of dicts, join back to the remaining dataframe columns, and drop the unneeded column
df = df.join(pd.json_normalize(df.col2)).drop(columns=['col2'])

display(df)

   col1                target                                       segment
0 x NAge 21 and older
1 x MinAge 21
2 x Retargeting people who may be similar to their customers
3 x Region the United States
4 y NAge 18 and older
5 y Location Type HOME
6 y Interest Hispanic culture
7 y Interest Republican Party (United States)
8 y Location Granularity country
9 y Country the United States
10 y MinAge 18

获取count

  • 如果目标是获得 count对于每个 'target'和相关'segment'
counts = df.groupby(['target', 'segment']).count()

已更新

  • 此更新是针对完整文件实现的
import pandas as pd
from ast import literal_eval

# load the file
df = pd.read_csv('en-US.csv')

# replace NaNs with '[]', otherwise literal_eval will error
df.targets = df.targets.fillna('[]')

# replace null with None, otherwise literal_eval will error
df.targets = df.targets.str.replace('null', 'None')

# convert the strings to lists of dicts
df.targets = df.targets.apply(literal_eval)

# use explode to give each dict in a list a separate row
df = df.explode('targets', ignore_index=True)

# fillna with {} is required for json_normalize
df.targets = df.targets.fillna({i: {} for i in df.index})

# normalize the column of dicts, join back to the remaining dataframe columns, and drop the unneeded column
normalized = pd.json_normalize(df.targets)

# get the counts
counts = normalized.groupby(['target', 'segment']).segment.count().reset_index(name='counts')

关于python - 如何将带有字典列表的 Pandas 列拆分为每个键的单独列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65621510/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com