gpt4 book ai didi

python - Webscraping - 不显示 html 代码的文本部分

转载 作者:行者123 更新时间:2023-11-28 18:07:02 26 4
gpt4 key购买 nike

当我尝试通过 python 使用 Selenium 库对网站进行网络抓取时遇到问题。关键是我想获得一些有关收集到该站点的歌曲的信息:https://bandcamp.com/?g=all&s=top&p=0&gn=0&f=all&w=0 .

但是,当我尝试从相应的 html 代码中提取文本时,该过程返回一个空列表。

如果我从我的浏览器 (Chrome) 查看 html 代码,我会看到文本部分,但是当我在 python 中查看相同的代码时,文本部分不会出现。

这是我的代码:

browser = webdriver.Chrome()
browser.get("https://bandcamp.com/?g=all&s=top&p=0&gn=0&f=all&w=0")

name_song = browser.find_elements_by_css_selector("a.item-title")
name_artist = browser.find_elements_by_css_selector("a.item-artist")

genre = browser.find_elements_by_class_name("item-genre")
print(name_song, name artist, genre)

当我打印这三个变量时,我得到了 html 代码,但我无法从中提取任何内容。我怎么解决这个问题?非常感谢您的帮助。

我的目标是将“Apocalypticists”、“Kriegsmachine”和“metal”分配给一个不同的变量。

That's the webpage of the site and the corresponding html code

最佳答案

你离我很近。您只需要诱导 WebDriverWait 让所需的元素可见 并将 WebElements 存储在三个不同的 List 和迭代它们以打印所需的文本,您可以使用以下解决方案:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC

    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    browser = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    browser.get("https://bandcamp.com/?g=all&s=top&p=0&gn=0&f=all&w=0")
    name_song = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.item-title")))
    name_artist = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"a.item-artist")))
    genre = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.XPATH,"//a[@class='item-artist']//following::span[1]")))
    for song, artist, gen in zip(name_song, name_artist, genre):
    print("%s song is by %s and is of %s genre" % (song.text, artist.text, gen.text))
  • 控制台输出:

    Apocalypticists song is by Kriegsmaschine and is of metal genre
    The Path song is by Carbon Based Lifeforms and is of ambient genre
    Christmas Time Is Here (N & S America Edition) song is by Khruangbin and is of funk genre
    Christmas Time Is Here (Excluding N & S America) song is by Khruangbin and is of funk genre
    Snailchan Adventure song is by Ujico*/Snail's House and is of electronic genre
    O God who avenges, shine forth. Rise up, Judge of the Earth; pay back to the proud what they deserve. song is by the body and is of metal genre
    T-Rex EP song is by Ben Prunty and is of soundtrack genre
    Woodland Womp (24bit 96kHz) song is by Kalya Scintilla and is of electronic genre

关于python - Webscraping - 不显示 html 代码的文本部分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52954472/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com