gpt4 book ai didi

python - 需要深入了解为什么 BeautifulSoup 无法使用类查询元素

转载 作者:行者123 更新时间:2023-12-04 07:39:25 27 4
gpt4 key购买 nike

对于这个简单的 BeautifulSoup 实验,我试图从 IMDB 页面中抓取一些简单的数据 https://www.imdb.com/title/tt7069210/问题是我无法获得类 rec_item 的元素 .我尝试了很多选择器来控制它,但每次它都返回一个空白列表。
现在,我觉得奇怪的原因是:

  • 带有 rec_item 的元素不在任何 iFrame 内。
  • 可以通过执行 view page source 来查看元素在浏览器上。因此,根据我的理解,它们在页面加载后不会被 javascript 加载。

  • 这是 repl.it link of the code
    问题:任何人都可以帮助我理解为什么列表 rec_item是空白?
    附加信息
    这是代码,
    from bs4 import BeautifulSoup
    import requests


    def extract(url):
    res = requests.get(url)
    bsoup = BeautifulSoup(res.text, 'html.parser')
    the_title = bsoup.select('meta[name="title"]')[0].attrs['content']
    print('Title: ' + the_title) # This works fine

    long_text = bsoup.select('#titleStoryLine .inline.canwrap span')[0].string.strip()
    print('Description: ' + long_text) # this too works fine

    similar_movies = bsoup.select('.rec_item')
    print(similar_movies) # blank array :(


    extract('https://www.imdb.com/title/tt7069210/')
    浏览器的查看页面源
    Browser's View Page Source
    这是 repl.it 的输出
    Code output from repl

    最佳答案

    您必须添加 headers获得合适的 HTML而不是一些三年级的机器人想要超文本。
    以下是完成此操作的方法:

    import requests
    from bs4 import BeautifulSoup

    headers = {
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.86 YaBrowser/21.3.0.740 Yowser/2.5 Safari/537.36"
    }


    def extract(url):
    res = requests.get(url, headers=headers)
    soup = BeautifulSoup(res.text, 'html.parser')
    the_title = soup.select('meta[name="title"]')[0].attrs['content']
    print('Title: ' + the_title) # This works fine

    long_text = soup.select('#titleStoryLine .inline.canwrap span')[0].string.strip()
    print('Description: ' + long_text) # this too works fine

    similar_movies = soup.select('.rec_item img')
    print([i["title"] for i in similar_movies]) # works now :)


    extract('https://www.imdb.com/title/tt7069210/')
    输出:
    Title: The Conjuring 3: The Devil Made Me Do It (2021) - IMDb
    Description: A chilling story of terror, murder and unknown evil that shocked even experienced real-life paranormal investigators Ed and Lorraine Warren. One of the most sensational cases from their files, it starts with a fight for the soul of a young boy, then takes them beyond anything they'd ever seen before, to mark the first time in U.S. history that a murder suspect would claim demonic possession as a defense.
    ['The Conjuring 2', 'The Conjuring 2 Remake', 'The Conjuring', 'The Maiden', 'Conjuring the Devil', 'Billie Eilish: Bury a Friend', 'Oxygen', 'The Curse of La Llorona', 'Annabelle Comes Home', 'Shang-Chi and the Legend of the Ten Rings', 'Malignant', 'The Nun']

    关于python - 需要深入了解为什么 BeautifulSoup 无法使用类查询元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67572744/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com