gpt4 book ai didi

Python -- Beautiful Soup -- 如果标签为空或有值则返回信息

转载 作者:行者123 更新时间:2023-12-04 09:36:52 31 4
gpt4 key购买 nike

我决定学习 Python,因为我现在有更多时间(由于大流行)并且一直在自学 Python。
我试图从一个网站上刮取税率,几乎可以获得我需要的一切。下面是来自我的 Soup 变量以及相关 Python 部分的代码片段。
我遇到困难的地方是我正在找到 optiondata-alias 一起标记那是空的 ("")。但是,如果您查看下面的代码,就会发现一些 data-alias非空的阶段 (见阿联酋或英国) - 他们列出了一些国家。
我正在寻找 data-url以及来自这些的国家名称。
当我丢失一些必需的信息时,我该如何编码以获取空标签和非空标签?
谢谢,
赛斯
我的代码:

import requests
from bs4 import BeautifulSoup
import re

l=[]
r = requests.get("https://taxsummaries.pwc.com/")
c=r.content
soup = BeautifulSoup(c, "html.parser")
all = soup.find_all("option", {"data-alias":""})

网站信息:

<option data-alias="" data-id="c9ddd85e-f3dc-4661-a4cb-8101f4644871" data-url="https://taxsummaries.pwc.com:443/uganda">Uganda</option>
<option data-alias="" data-id="d21e8abe-784c-4617-a90e-5369b49a202f" data-url="https://taxsummaries.pwc.com:443/ukraine">Ukraine</option>
<option data-alias="UAE" data-id="9e3f5e7b-f110-47dd-95d8-3d8160466e4a" data-url="https://taxsummaries.pwc.com:443/united-arab-emirates">United Arab Emirates</option>
<option data-alias="Great Britain
UK
Britain
Whales
Northern Ireland
England" data-id="3c42b2a9-7ed6-4b19-821d-5d78ef6f2b5d" data-url="https://taxsummaries.pwc.com:443/united-kingdom">United Kingdom</option>

最佳答案

您需要使用 {"data-alias":True} .你可以试试看:

import requests
from bs4 import BeautifulSoup
l=[]
r = requests.get("https://taxsummaries.pwc.com/")
c=r.content
soup = BeautifulSoup(c, "html.parser")
options = soup.find_all('option', {"data-alias":True})
for each in options:
print("country_name : " + str(each.text), " data-url : " + str(each['data-url']))
输出将是:
country_name : Albania  data-url : https://taxsummaries.pwc.com:443/albania
country_name : Algeria data-url : https://taxsummaries.pwc.com:443/algeria
country_name : Angola data-url : https://taxsummaries.pwc.com:443/angola
country_name : Argentina data-url : https://taxsummaries.pwc.com:443/argentina
country_name : Armenia data-url : https://taxsummaries.pwc.com:443/armenia
country_name : Australia data-url : https://taxsummaries.pwc.com:443/australia
country_name : Austria data-url : https://taxsummaries.pwc.com:443/austria
country_name : Azerbaijan data-url : https://taxsummaries.pwc.com:443/azerbaijan
country_name : Bahrain data-url : https://taxsummaries.pwc.com:443/bahrain
country_name : Barbados data-url : https://taxsummaries.pwc.com:443/barbados
country_name : Belarus data-url : https://taxsummaries.pwc.com:443/belarus
country_name : Belgium data-url : https://taxsummaries.pwc.com:443/belgium
country_name : Bermuda data-url : https://taxsummaries.pwc.com:443/bermuda
country_name : Bolivia data-url : https://taxsummaries.pwc.com:443/bolivia
country_name : Bosnia and Herzegovina data-url : https://taxsummaries.pwc.com:443/bosnia-and-herzegovina
country_name : Botswana data-url : https://taxsummaries.pwc.com:443/botswana
country_name : Brazil data-url : https://taxsummaries.pwc.com:443/brazil
country_name : Bulgaria data-url : https://taxsummaries.pwc.com:443/bulgaria


and so on ......
获得 list :
for each in options:
l.append( str(each.text)+ " : " + str(each['data-url']))
print(l)
输出将是:
['Albania : https://taxsummaries.pwc.com:443/albania', 'Algeria : https://taxsummaries.pwc.com:443/algeria', 'Angola : https://taxsummaries.pwc.com:443/angola', 'Argentina : https://taxsummaries.pwc.com:443/argentina', 'Armenia : https://taxsummaries.pwc.com:443/armenia', 'Australia : https://taxsummaries.pwc.com:443/australia', 'Austria : https://taxsummaries.pwc.com:443/austria', 'Azerbaijan : https://taxsummaries.pwc.com:443/azerbaijan', 'Bahrain : https://taxsummaries.pwc.com:443/bahrain', 'Barbados : https://taxsummaries.pwc.com:443/barbados', 'Belarus : https://taxsummaries.pwc.com:443/belarus', 'Belgium : https://taxsummaries.pwc.com:443/belgium', 'Bermuda : https://taxsummaries.pwc.com:443/bermuda', 'Bolivia : https://taxsummaries.pwc.com:443/bolivia', 'Bosnia and Herzegovina : https://taxsummaries.pwc.com:443/bosnia-and-herzegovina', 'Botswana : https://taxsummaries.pwc.com:443/botswana', 'Brazil : https://taxsummaries.pwc.com:443/brazil', 'Bulgaria : https://taxsummaries.pwc.com:443/bulgaria', 'Cabo Verde : https://taxsummaries.pwc.com:443/cabo-verde', 'Cambodia : https://taxsummaries.pwc.com:443/cambodia', 'Cameroon, Republic of : https://taxsummaries.pwc.com:443/republic-of-cameroon',


and so on............]

关于Python -- Beautiful Soup -- 如果标签为空或有值则返回信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62533888/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com