gpt4 book ai didi

python-3.x - 如何使用 BeautifulSoup 从表中选择特定行?

转载 作者:行者123 更新时间:2023-12-04 10:01:34 24 4
gpt4 key购买 nike

所以我有一个与上一个问题相关的问题,但我意识到我需要再上一层才能获得 11 位 NDC 代码而不是 10 位 NDC 代码。与其稍后转换它们,我想我可以一开始就捕获它们。这是上一个问题的链接。 Is there a way to parse data from multiple pages from a parent webpage?我想要做的是点击这里的链接(顺便说一下,这是第二级)
2nd level with 10 digit NDC codes

然后获取下一页上的 11 位 NDC 代码

3rd level containing 11-digit NDC codes

我能够编写代码以访问该页面,但我不确定如何选择它。数字在一个标签中,然后在一个标签中,但我只想要表中的特定行,所以我想我可以像这样获得索引,但我在整个列表中都得到了 None 类型和 td 。这是我的代码

import requests
from bs4 import BeautifulSoup
url ='https://ndclist.com/?s=Trospium'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

all_data = []
for a in soup.select('[data-title="NDC"] a[href]'):
link_url = a['href']
print('Processin link {}...'.format(link_url))

soup2 = BeautifulSoup(requests.get(link_url).content, 'html.parser')
for b in soup2.select('#product-packages a'):
link_url2 = b['href']
print('Processing link {}... '.format(link_url2))
soup3 = BeautifulSoup(requests.get(link_url2).content, 'html.parser')
for link in soup3.findAll('tr', limit=7)[1]:
print(link.name)
all_data.append(link.name)

print('Trospium')
print(all_data)

最佳答案

只需对您的代码稍作修改:

import requests
from bs4 import BeautifulSoup
url ='https://ndclist.com/?s=Trospium'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

all_data = []
for a in soup.select('[data-title="NDC"] a[href]'):
link_url = a['href']
print('Processing link {}...'.format(link_url))

soup2 = BeautifulSoup(requests.get(link_url).content, 'html.parser')
for b in soup2.select('#product-packages a'):
link_url2 = b['href']
print('\tProcessing link {}... '.format(link_url2))
soup3 = BeautifulSoup(requests.get(link_url2).content, 'html.parser')
ndc_billing_format = soup3.select_one('td:contains("11-Digit NDC Billing Format") + td').contents[0].strip()
print('\t\t{}'.format(ndc_billing_format))
all_data.append(ndc_billing_format)

print('Trospium')
print(all_data)

打印:
Processing link https://ndclist.com/ndc/0574-0118...
Processing link https://ndclist.com/ndc/0574-0118/package/0574-0118-30...
00574011830
Processing link https://ndclist.com/ndc/0574-0145...
Processing link https://ndclist.com/ndc/0574-0145/package/0574-0145-60...
00574014560
Processing link https://ndclist.com/ndc/0591-3636...
Processing link https://ndclist.com/ndc/0591-3636/package/0591-3636-05...
00591363605
Processing link https://ndclist.com/ndc/0591-3636/package/0591-3636-30...
00591363630
Processing link https://ndclist.com/ndc/0591-3636/package/0591-3636-60...
00591363660
Processing link https://ndclist.com/ndc/23155-530...
Processing link https://ndclist.com/ndc/23155-530/package/23155-530-02...
23155053002
Processing link https://ndclist.com/ndc/23155-530/package/23155-530-05...
23155053005
Processing link https://ndclist.com/ndc/23155-530/package/23155-530-06...
23155053006
Processing link https://ndclist.com/ndc/42291-846...
Processing link https://ndclist.com/ndc/42291-846/package/42291-846-60...
42291084660
Processing link https://ndclist.com/ndc/60429-098...
Processing link https://ndclist.com/ndc/60429-098/package/60429-098-30...
60429009830
Processing link https://ndclist.com/ndc/60505-3454...
Processing link https://ndclist.com/ndc/60505-3454/package/60505-3454-5...
60505345405
Processing link https://ndclist.com/ndc/60505-3454/package/60505-3454-6...
60505345406
Processing link https://ndclist.com/ndc/60505-3454/package/60505-3454-8...
60505345408
Processing link https://ndclist.com/ndc/68001-228...
Processing link https://ndclist.com/ndc/68001-228/package/68001-228-04...
68001022804
Processing link https://ndclist.com/ndc/68462-461...
Processing link https://ndclist.com/ndc/68462-461/package/68462-461-05...
68462046105
Processing link https://ndclist.com/ndc/68462-461/package/68462-461-30...
68462046130
Processing link https://ndclist.com/ndc/68462-461/package/68462-461-60...
68462046160
Processing link https://ndclist.com/ndc/69097-912...
Processing link https://ndclist.com/ndc/69097-912/package/69097-912-02...
69097091202
Processing link https://ndclist.com/ndc/69097-912/package/69097-912-03...
69097091203
Processing link https://ndclist.com/ndc/69097-912/package/69097-912-15...
69097091215
Processing link https://ndclist.com/ndc/69150-258...
Processing link https://ndclist.com/ndc/69150-258/package/69150-258-06...
69150025806
Processing link https://ndclist.com/ndc/76282-336...
Processing link https://ndclist.com/ndc/76282-336/package/76282-336-60...
76282033660
Trospium
['00574011830', '00574014560', '00591363605', '00591363630', '00591363660', '23155053002', '23155053005', '23155053006', '42291084660', '60429009830', '60505345405', '60505345406', '60505345408', '68001022804', '68462046105', '68462046130', '68462046160', '69097091202', '69097091203', '69097091215', '69150025806', '76282033660']

关于python-3.x - 如何使用 BeautifulSoup 从表中选择特定行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61801700/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com