gpt4 book ai didi

python - 网页抓取 : How to extract just the Information that I need

转载 作者:行者123 更新时间:2023-12-01 07:37:10 25 4
gpt4 key购买 nike

我必须从 congress.gov 网站 ( https://www.congress.gov/search?q=%7B%22source%22%3A%22legislation%22%2C%22congress%22%3A%22115%22%2C%22type%22%3A%22bills%22%7D&page=113 ) 中抓取一些信息。我无法提取有关赞助商的信息。

import os
import requests
import csv
from bs4 import BeautifulSoup
import re
x=0
y=0
index=0;
mydirectory= '/Users/Antonio/Desktop/statapython assignment'
congress115 =os.path.join(mydirectory, '115congress.csv')
headers = {'User-Agent': 'Make_America_Great_Again',
'From': 'Donald'}
with open('115congress.csv', 'w') as f:
fwriter=csv.writer(f, delimiter=';')
fwriter.writerow(['Spons'])
for j in range(1, 114):
hrurl='https://www.congress.gov/search?q=%7B%22source%22%3A%22legislation%22%2C%22congress%22%3A%22115%22%2C%22type%22%3A%22bills%22%7D&page='+str(j)
hrpage=requests.get(hrurl, headers=headers)
data=hrpage.text
soup=BeautifulSoup(data, 'lxml')
#index=0;
for q in soup.findAll('span', {'class':'result-item'}):
for a in q.findAll('a', href=True, text=True, target='_blank'):
if a==y:
continue
y=a
Spons=a['href']
print(Spons)

我得到的是这样的(为了简洁起见,我只会报告 7401 结果之一)

/member/michael-enzi/E000285

当我需要

Sen. Enzi, Michael B. [R-WY] 

如果我写错了东西,我很抱歉,但这是我在这里的第一个问题。任何帮助将不胜感激。

最佳答案

只需从 <a> 中提取文本标签(而不是 href 属性):

...
Spons = a.text

关于python - 网页抓取 : How to extract just the Information that I need,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56934525/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com