gpt4 book ai didi

python - 使用 beautifulsoup 在 python 中创建表

转载 作者:行者123 更新时间:2023-12-01 09:04:09 25 4
gpt4 key购买 nike

这里是Python新手,我有一个关于使用Beautiful soup从零开始创建表的问题。这是我正在使用的代码:

import requests
page=requests.get("https://www.opensecrets.org/lobby/lobbyist.php?id=Y0000008510L&year=2018")
from bs4 import BeautifulSoup
soup=BeautifulSoup(page.content, 'lxml')
table=soup.find(‘table’,{‘id’:’lobbyist_summary’})
for row in table:
cells=row.find_all(‘a’)
rn=cells[0].get_text()

错误是:

AttributeError: 'NavigableString' object has no attribute 'find_all'

打印(表)看起来像这样:

[<a href="firmsum.php?id=D000037635&amp;year=2018">Ballard Partners</a>, <a href="clientsum.php?id=F203227&amp;year=2018">Advanced Roofing Inc</a>, <a href="clientsum.php?id=F214670&amp;year=2018">Africell Holding</a>, <a href="clientsum.php?id=D000023883&amp;year=2018">Amazon.com</a>, ...]

我希望(最终)得到一个表,其中每个感兴趣的元素都在单独的列中,以便它看起来像:

[[firmsum,D000037635,2018,Ballard Partners],[clientsum,F203227,2018,Advanced Roofing Inc],[clientsum,F214670,2018,Africell Holding],[clientsum,D000023883,2018, Amazon.com]。 ..]

最佳答案

分配 4 个空列表:

col1List = list()
col2List = list()
col3List = list()
col4List = list()

首先,让我们获取第 4 列的值:

trs = table.find_all('tr')[1]
tds = trs.find_all('a')

for i in range(len(tds)):
col4List.append(tds[i].get_text())

这给出:

['Ballard Partners', 'Advanced Roofing Inc', 'Africell Holding',....]

现在,让我们从 href 中提取前 3 列的值:

hrefVal = trs.find_all('a')

for i in hrefVal:
hVal = i.get('href')
col11 = hVal.split('.php?id=', 1)
col1 = col11[0]
col1List.append(col1)
col22 = col11[1].split('&', 1)
col2 = col22[0]
col2List.append(col2)
col33 = col22[1].split('=', 1)
col3 = col33[1]
col3List.append(col3)

现在,让我们将所有列表放入数据框中,使其看起来整洁:

import pandas as pd

df = pd.DataFrame()
df['Col1'] = col1List
df['Col2'] = col2List
df['Col3'] = col3List
df['Col4'] = col4List

如果我输出前几行,它将看起来像你想要的那样:

Col1        Col2        Col3    Col4
firmsum D000037635 2018 Ballard Partners
clientsum F203227 2018 Advanced Roofing Inc
clientsum F214670 2018 Africell Holding
clientsum D000023883 2018 Amazon.com
clientsum D000000192 2018 American Health Care Assn
clientsum D000021839 2018 American Road & Transport Builders Assn

关于python - 使用 beautifulsoup 在 python 中创建表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52212754/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com