gpt4 book ai didi

python - Pandas read_html 生成带有元组列名的空 df

转载 作者:行者123 更新时间:2023-12-04 08:55:51 25 4
gpt4 key购买 nike

我想检索以下网站上的表格并将它们存储在 Pandas 数据框中:https://www.acf.hhs.gov/orr/resource/ffy-2012-13-state-of-colorado-orr-funded-programs
但是,页面上的第三个表返回一个空数据框,其中所有表的数据都存储在元组中作为列标题:

Empty DataFrame
Columns: [(Service Providers, State of Colorado), (Cuban - Haitian Program, $0), (Refugee Preventive Health Program, $150,000.00), (Refugee School Impact, $450,000), (Services to Older Refugees Program, $0), (Targeted Assistance - Discretionary, $0), (Total FY, $600,000)]
Index: []
有没有办法将元组标题“展平”为标题 + 值,然后将其附加到由所有四个表组成的数据帧?我的代码在下面——它已经在其他类似的页面上工作,但由于这个表格的格式而不断中断。谢谢!
funds_df = pd.DataFrame()
url = 'https://www.acf.hhs.gov/programs/orr/resource/ffy-2011-12-state-of-colorado-orr-funded-programs'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
year = url.split('ffy-')[1].split('-orr')[0]
tables = page.content
df_list = pd.read_html(tables)
for df in df_list:
df['URL'] = url
df['YEAR'] = year
funds_df = funds_df.append(df)

最佳答案

  • 对于这个站点,不需要 beautifulsouprequests
  • pandas.read_html 创建一个列表 DataFrames每个<table>在网址。

  • import pandas as pd

    url = 'https://www.acf.hhs.gov/orr/resource/ffy-2012-13-state-of-colorado-orr-funded-programs'

    # read the url
    dfl = pd.read_html(url)

    # see each dataframe in the list; there are 4 in this case
    for i, d in enumerate(dfl):
    print(i)
    display(d) # display worker in Jupyter, otherwise use print
    print('\n')
  • dfl[0]

  •    Service Providers Cash and Medical Assistance* Refugee Social Services Program Targeted Assistance Program       TOTAL
    0 State of Colorado $7,140,000 $1,896,854 $503,424 $9,540,278
  • dfl[1]
  •      WF-CMA 2         RSS     TAG-F CMA Mandatory 3       TOTAL
    0 $3,309,953 $1,896,854 $503,424 $7,140,000 $9,540,278
  • dfl[2]
  •    Service Providers Refugee School Impact Targeted Assistance - Discretionary Services to Older Refugees Program Refugee Preventive Health Program Cuban - Haitian Program     Total
    0 State of Colorado $430,000 $0 $100,000 $150,000 $0 $680,000
  • dfl[3]
  •   Volag                             Affiliate Name Projected ORR  MG Funding                                                                     Director
    0 CWS Ecumenical Refugee & Immigration Services $127,600 Ferdi Mevlani 1600 Downing St., Suite 400 Denver, CO 80218 303-860-0128
    1 ECDC ECDC African Community Center $308,000 Jennifer Guddiche 5250 Leetsdale Drive Denver, CO 80246 303-399-4500
    2 EMM Ecumenical Refugee Services $191,400 Ferdi Mevlani 1600 Downing St., Suite 400 Denver, CO 80218 303-860-0128
    3 LIRS Lutheran Family Services Rocky Mountains $121,000 Floyd Preston 132 E Las Animas Colorado Springs, CO 80903 719-314-0223
    4 LIRS Lutheran Family Services Rocky Mountains $365,200 James Horan 1600 Downing Street, Suite 600 Denver, CO 80218 303-980-5400

    关于python - Pandas read_html 生成带有元组列名的空 df,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63834594/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com