gpt4 book ai didi

python - 如何将嵌套的 JSON 键规范化为 Pandas 数据帧

转载 作者:行者123 更新时间:2023-12-04 08:15:48 24 4
gpt4 key购买 nike

我一般是 Python 和 API 的新手,所以这可能是一个简单答案的基本问题。我正在尝试从 Propublica's API 获取有关国会代表的数据使用 Python。我可以让 REST API 运行,但是我在将生成的 json 数据正确构建为数据帧时遇到了问题。我认为这是因为数据中有多个嵌套级别。我尝试规范化数据,但我只能让它在第一个嵌套级别工作。
这是我的代码。请注意,我已经删除了我的 API key ,但您可以快速轻松地获得一个 here .

# Import programs
import pandas as pd
from pandas.io.json import json_normalize
import requests
import json
import time
import csv

### Index 0

# Requesting data trhough API
payload = {'X-API-Key': 'a876543211234'}
terms = '"trade war"AND"China"'
index = str(0) # 440 is last offset for this call

response = requests.get('https://api.propublica.org/congress/v1/116/house/members.json', headers=payload)
print(response.status_code)

#Formating json files better
json_data = json.loads(response.content.decode("utf-8"))

# Writing Data as String
json_string = json.dumps(json_data)

# Creating Stage 1 dataframe
jdata = json.loads(json_string)
df = pd.DataFrame(jdata)
df2 = pd.DataFrame(df.results)

# Normalizing Data - converts nested data into a regular looking dataframe
normal_data_0 = json_normalize(data=df['results'])
这就是 JSON 数据的样子。请注意,所有代表的数据都嵌套在“结果”和“成员”下:
{'status': 'OK',
'copyright': ' Copyright (c) 2021 Pro Publica Inc. All Rights Reserved.',
'results': [{'congress': '116',
'chamber': 'House',
'num_results': 451,
'offset': 0,
'members': [{'id': 'A000374',
'title': 'Representative',
'short_title': 'Rep.',
'api_uri': 'https://api.propublica.org/congress/v1/members/A000374.json',
'first_name': 'Ralph',
'middle_name': None,
'last_name': 'Abraham',
'suffix': None,
'date_of_birth': '1954-09-16',
'gender': 'M',
'party': 'R',
'leadership_role': '',
'twitter_account': 'RepAbraham',
'facebook_account': 'CongressmanRalphAbraham',
'youtube_account': None,
'govtrack_id': '412630',
'cspan_id': '76236',
'votesmart_id': '155414',
'icpsr_id': '21522',
'crp_id': 'N00036633',
'google_entity_id': '/m/012dwd7_',
'fec_candidate_id': 'H4LA05221',
'url': 'https://abraham.house.gov',
'rss_url': 'https://abraham.house.gov/rss.xml',
'contact_form': None,
'in_office': False,
'cook_pvi': 'R+15',
'dw_nominate': 0.541,
'ideal_point': None,
'seniority': '6',
'next_election': '2020',
'total_votes': 954,
'missed_votes': 377,
'total_present': 0,
'last_updated': '2020-12-31 18:30:50 -0500',
'ocd_id': 'ocd-division/country:us/state:la/cd:5',
'office': '417 Cannon House Office Building',
'phone': '202-225-8490',
'fax': None,
'state': 'LA',
'district': '5',
'at_large': False,
'geoid': '2205',
'missed_votes_pct': 39.52,
'votes_with_party_pct': 94.93,
'votes_against_party_pct': 4.9},
{'id': 'A000370',
'title': 'Representative',
...
这就是我的“数据集”的样子。所有 JSON 数据都作为字符串存储在唯一行的“成员”列中:
normal_data_0

congress chamber num_results offset members
0 116 House 451 0 [{'id': 'A000374', 'title': 'Representative', ...
我试过通过 json_normalize 运行数据两次,并通过添加两个变量 [results,members]以及。我尝试过的一切都没有奏效。
有什么建议?

最佳答案

  • 'results' key是 1 个元素 list , 所以 'members'可以通过选择 'members' 进行归一化来自 dict 的 key 在索引 0。

  • import pandas as pd
    import requests

    # Requesting data trhough API
    payload = {'X-API-Key': '...'}
    terms = '"trade war"AND"China"'
    index = str(0) # 440 is last offset for this call

    response = requests.get('https://api.propublica.org/congress/v1/116/house/members.json', headers=payload)

    # extract the json data from the response
    json_data = response.json()

    # normalize only members
    members = pd.json_normalize(data=json_data['results'][0]['members'])

    # alternatively: normalize members and the preceding keys
    members = pd.json_normalize(data=json_data['results'][0], record_path=['members'], meta=['congress', 'chamber', 'num_results', 'offset'])
    display(members)
            id           title short_title                                                      api_uri first_name middle_name  last_name suffix date_of_birth gender party leadership_role  twitter_account         facebook_account youtube_account govtrack_id cspan_id votesmart_id icpsr_id     crp_id google_entity_id fec_candidate_id                          url                                         rss_url contact_form  in_office cook_pvi  dw_nominate ideal_point seniority next_election  total_votes  missed_votes  total_present               last_updated                                  ocd_id                                office         phone   fax state  district  at_large geoid  missed_votes_pct  votes_with_party_pct  votes_against_party_pct
    0 A000374 Representative Rep. https://api.propublica.org/congress/v1/members/A000374.json Ralph None Abraham None 1954-09-16 M R RepAbraham CongressmanRalphAbraham None 412630 76236 155414 21522 N00036633 /m/012dwd7_ H4LA05221 https://abraham.house.gov https://abraham.house.gov/rss.xml None False R+15 0.541 None 6 2020 954.0 377.0 0.0 2020-12-31 18:30:50 -0500 ocd-division/country:us/state:la/cd:5 417 Cannon House Office Building 202-225-8490 None LA 5 False 2205 39.52 94.93 4.90
    1 A000370 Representative Rep. https://api.propublica.org/congress/v1/members/A000370.json Alma None Adams None 1946-05-27 F D None RepAdams CongresswomanAdams None 412607 76386 5935 21545 N00035451 /m/02b45d H4NC12100 https://adams.house.gov https://adams.house.gov/rss.xml None False D+18 -0.465 None 8 2020 954.0 26.0 0.0 2020-12-31 18:30:55 -0500 ocd-division/country:us/state:nc/cd:12 2436 Rayburn House Office Building 202-225-1510 None NC 12 False 3712 2.73 99.24 0.65
    2 A000055 Representative Rep. https://api.propublica.org/congress/v1/members/A000055.json Robert B. Aderholt None 1965-07-22 M R None Robert_Aderholt RobertAderholt RobertAderholt 400004 45516 441 29701 N00003028 /m/024p03 H6AL04098 https://aderholt.house.gov https://aderholt.house.gov/rss.xml None False R+30 0.369 None 24 2020 954.0 71.0 0.0 2020-12-31 18:30:49 -0500 ocd-division/country:us/state:al/cd:4 1203 Longworth House Office Building 202-225-4876 None AL 4 False 0104 7.44 93.60 6.29
    3 A000371 Representative Rep. https://api.propublica.org/congress/v1/members/A000371.json Pete None Aguilar None 1979-06-19 M D None reppeteaguilar reppeteaguilar None 412615 79994 70114 21506 N00033997 /m/0jwv0xf H2CA31125 https://aguilar.house.gov https://aguilar.house.gov/rss.xml None False D+8 -0.291 None 6 2020 954.0 9.0 0.0 2020-12-31 18:30:52 -0500 ocd-division/country:us/state:ca/cd:31 109 Cannon House Office Building 202-225-3201 None CA 31 False 0631 0.94 97.45 2.44
    4 A000372 Representative Rep. https://api.propublica.org/congress/v1/members/A000372.json Rick None Allen None 1951-11-07 M R None reprickallen CongressmanRickAllen None 412625 62545 136062 21516 N00033720 /m/0127y9dk H2GA12121 https://allen.house.gov None None False R+9 0.679 None 6 2020 954.0 15.0 0.0 2020-12-31 18:30:49 -0500 ocd-division/country:us/state:ga/cd:12 2400 Rayburn House Office Building 202-225-2823 None GA 12 False 1312 1.57 92.26 7.63
    5 A000376 Representative Rep. https://api.propublica.org/congress/v1/members/A000376.json Colin None Allred None 1983-04-15 M D None RepColinAllred None None 412828 None 177357 None N00040989 /m/03d066b H8TX32098 https://allred.house.gov None None False R+5 NaN None 2 2020 954.0 29.0 0.0 2020-12-31 18:30:52 -0500 ocd-division/country:us/state:tx/cd:32 328 Cannon House Office Building 202-225-2231 None TX 32 False 4832 3.04 97.72 2.17
    6 A000367 Representative Rep. https://api.propublica.org/congress/v1/members/A000367.json Justin None Amash None 1980-04-18 M I justinamash repjustinamash repjustinamash 412438 1033767 105566 21143 N00031938 /m/0c00p_n https://amash.house.gov https://amash.house.gov/rss.xml None False R+6 NaN None 10 2020 524.0 0.0 10.0 2020-12-31 18:30:47 -0500 ocd-division/country:us/state:mi/cd:3 None None None MI 3 False 2603 0.00 58.49 41.51
    7 A000367 Representative Rep. https://api.propublica.org/congress/v1/members/A000367.json Justin None Amash None 1980-04-18 M R justinamash repjustinamash repjustinamash 412438 1033767 105566 21143 N00031938 /m/0c00p_n H0MI03126 https://amash.house.gov https://amash.house.gov/rss.xml None False None 0.654 None 10 2020 430.0 0.0 5.0 2020-12-28 21:04:36 -0500 ocd-division/country:us/state:mi/cd:3 106 Cannon House Office Building 202-225-3831 None MI 3 False 2603 0.00 61.97 37.79
    8 A000369 Representative Rep. https://api.propublica.org/congress/v1/members/A000369.json Mark None Amodei None 1958-06-12 M R None MarkAmodeiNV2 MarkAmodeiNV2 markamodeinv2 412500 62817 12537 21196 N00031177 /m/03bzdkn H2NV02395 https://amodei.house.gov https://amodei.house.gov/rss/news-releases.xml None False R+7 0.384 None 10 2020 954.0 36.0 0.0 2020-12-31 18:30:49 -0500 ocd-division/country:us/state:nv/cd:2 104 Cannon House Office Building 202-225-6155 None NV 2 False 3202 3.77 92.63 7.26
    9 A000377 Representative Rep. https://api.propublica.org/congress/v1/members/A000377.json Kelly None Armstrong None 1976-10-08 M R None RepArmstrongND None None 412794 None 139338 None N00042868 /g/11hcszksh3 H8ND00096 https://armstrong.house.gov None None False R+16 NaN None 2 2020 954.0 33.0 0.0 2020-12-31 18:30:49 -0500 ocd-division/country:us/state:nd/cd:1 1004 Longworth House Office Building 202-225-2611 None ND At-Large True 3800 3.46 93.31 6.58

    关于python - 如何将嵌套的 JSON 键规范化为 Pandas 数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65710084/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com