gpt4 book ai didi

python - 使用 BS,当某些迭代中缺少行时循环遍历标签 python3

转载 作者:太空宇宙 更新时间:2023-11-03 21:01:56 24 4
gpt4 key购买 nike

我正在尝试使用 BS 来选择带有 span-name*=nv 标签的项目。但是,最多有 2 条“跨度”行(我想要第二条),当第二行丢失时,我的循环将返回索引错误,而不是仅仅跳过丢失的行。

如何告诉我的循环跳过缺失的行并在索引存在时返回?

https://www.imdb.com/list/ls047677021/

<小时/>

以下有效。但是,如果我将索引从 0 更改为 1(这就是我需要的),我会收到索引错误。我明白为什么会收到错误,但我不知道如何解决它。

sauce = urllib.request.urlopen('https://www.imdb.com/list/ls047677021/').read()
soup = bs.BeautifulSoup(sauce, 'lxml')


gross=[]

for div in soup.find_all('div', class_='lister-item mode-detail'):
money = div.select('span[name*=nv]')[0]['data-value']
gross.append(money)

gross
<小时/>
gross=[]

for div in soup.find_all('div', class_='lister-item mode-detail'):
money = div.select('span[name*=nv]')[1]['data-value']
gross.append(money)

gross

错误:

<小时/>
IndexError                                Traceback (most recent call last)
<ipython-input-42-67c1c65a2cce> in <module>
2
3 for div in soup.find_all('div', class_='lister-item mode-detail'):
----> 4 money = div.select('span[name*=nv]')[1]['data-value']
5 gross.append(money)
6

IndexError: list index out of range

https://www.imdb.com/list/ls047677021/

我应该获取字典中的值列表。相反,我收到错误(或者如果我更改索引)或错误的数据。

最佳答案

编辑:这是使用 Beautiful Soup 4.7+。

就我个人而言,我会进一步缩小您的选择范围。实际上,您可以使用更好一点的选择器来仅定位您需要的目标。然后您可以使用 select_one 它将仅返回您想要的元素。然后,您可以通过查看它是否为 None(意味着您没有)来检查您是否获得了它。

您可以通过多种方式获得此信息。首先,我将向您展示如何使用 CSS 级别 4 中的一项新功能::nth-child(an+b of s)。这将选择与您的选择器匹配的第二个范围。 https://facelessuser.github.io/soupsieve/selectors/#:nth-child

CSS 规范:https://drafts.csswg.org/selectors-4/#the-nth-child-pseudo

gross=[]

for div in soup.find_all('div', class_='lister-item mode-detail'):
el = div.select_one(':nth-child(2 of span[name*=nv])')
if el is not None:
gross.append(el['data-value'])

print(gross)

或者您可以使用自定义 :contains 伪类来定位带有 Gross: 的跨度之后的跨度。 https://facelessuser.github.io/soupsieve/selectors/#:contains

gross=[]

for div in soup.find_all('div', class_='lister-item mode-detail'):
el = div.select_one('span:contains("Gross:") + span[name*=nv]')
if el is not None:
gross.append(el['data-value'])

print(gross)

输出:

['678,815,482', '700,059,566', '324,591,735', '417,719,760', '145,443,742', '213,515,506', '220,159,104', '137,690,172', '608,581,744', '216,210,777', '216,648,740', '213,767,512', '188,024,361', '51,024,708', '159,555,901', '215,288,866', '117,443,149', '159,342,015', '139,377,762', '120,634,935', '32,732,301', '57,421,715', '46,874,505', '174,532,921', '44,069,456', '335,061,807', '53,542,417', '99,345,950', '59,185,715', '102,084,362', '50,072,235', '69,001,013', '18,095,701', '100,407,760', '44,936,545', '48,686,605', '67,796,355', '54,547,470', '30,014,539', '69,086,325', '17,839,115', '81,903,458', '100,478,608', '99,215,042', '59,839,515', '2,474,044', '167,510,016', '21,704,844', '44,947,622', '115,715,889', '36,108,758', '28,780,744', '11,871,365', '48,795,601', '45,495,662', '1,214,525', '40,826,341', '40,717,020', '32,015,231', '21,023,275', '270,620,950', '33,562,069', '29,819,114', '35,851,379', '34,017,028', '30,824,628', '58,032,443', '50,316,123', '36,343,858', '201,089,881', '31,445,012', '42,402,632', '54,858,851', '171,956,231', '30,569,484', '26,020,957', '14,841,338', '127,195,589', '42,469,946', '30,617,396', '2,523,610', '20,706,452', '6,708,147', '9,227,130', '67,347,895', '52,856,061', '115,253,424', '68,549,695', '77,339,130', '68,566,296']
<小时/>

对于 Beautiful Soup 4.6 及更低版本,您可以单独安装 Beautiful Soup 的新选择器库并使用它,即使它未集成在 4.6 中。只需通过 pip 安装:pip install soupsieve

import soupsieve as sv

gross=[]

for div in soup.find_all('div', class_='lister-item mode-detail'):
el = sv.select_one(':nth-child(2 of span[name*=nv])', div)
if el is not None:
gross.append(el['data-value'])

print(gross)

关于python - 使用 BS,当某些迭代中缺少行时循环遍历标签 python3,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55659551/

24 4 0