gpt4 book ai didi

python - 使用 python 中的 selenium 抓取 youtube 上的所有评论及其回复

转载 作者:行者123 更新时间:2023-12-01 07:47:38 34 4
gpt4 key购买 nike

我正在尝试抓取 YouTube 视频评论及其回复、评论喜欢、评论不喜欢、评论计数、回复计数。

首先,我尝试根据 id 在 python 中使用 selenium google drivers 来抓取评论等文本数据及其回复。

我只能抓取页面中可用的评论,而不是其回复。

回复无法实现。

import time
import csv
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

chrome_path = "/Users/Downloads/chromedriver"
page_url = "https://www.youtube.com/watch?v=AJesAlohO6I&t="


driver = webdriver.Chrome(executable_path=chrome_path)
driver.get(page_url)
time.sleep(2)


title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
print(title)


SCROLL_PAUSE_TIME = 2
CYCLES = 100

html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.PAGE_DOWN)
html.send_keys(Keys.PAGE_DOWN)
time.sleep(SCROLL_PAUSE_TIME * 3)

for i in range(CYCLES):
html.send_keys(Keys.END)
time.sleep(SCROLL_PAUSE_TIME)


comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
all_comments = [elem.text for elem in comment_elems]
print(all_comments)

write_file = "output_testing.csv"
with open(write_file, "w") as output:
for line in all_comments:
output.write(line + '\n')

通过上面的代码,我可以只抓取评论。如何在 python 中使用 selenium 抓取这些评论的回复、喜欢、不喜欢、这些评论的日期。

谁能帮我指出我哪里出错了。

更新的代码(空数组)

import time
import csv
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

chrome_path = "/Users/Downloads/chromedriver"
page_url = "https://www.youtube.com/watch?v=qBp1rCz_yQU"


driver = webdriver.Chrome(executable_path=chrome_path)
driver.get(page_url)
time.sleep(2)


title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
print(title)


SCROLL_PAUSE_TIME = 2
CYCLES = 100

html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.PAGE_DOWN)
html.send_keys(Keys.PAGE_DOWN)
time.sleep(SCROLL_PAUSE_TIME * 3)

for i in range(CYCLES):
html.send_keys(Keys.END)
time.sleep(SCROLL_PAUSE_TIME)

driver.find_elements_by_xpath('//div[@id="replies"]/ytd-comment-replies-renderer/ytd-expander/paper-button[@id="more"]')

comment_elems = driver.find_elements_by_xpath('//div[@id="loaded-replies"]//yt-formatted-string[@id="content-text"]')
all_comments = [elem.text for elem in comment_elems]
print(all_comments)

write_file = "output_31may.csv"
with open(write_file, "w") as output:
for line in all_comments:
output.write(line + '\n')

我更新的代码:(1-05-2019)

import time
import csv
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

chrome_path = "/Users/Downloads/chromedriver"
page_url = "https://www.youtube.com/watch?v=qBp1rCz_yQU"


driver = webdriver.Chrome(executable_path=chrome_path)
driver.get(page_url)
time.sleep(2)


title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
print(title)


SCROLL_PAUSE_TIME = 2
CYCLES = 100

html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.PAGE_DOWN)
html.send_keys(Keys.PAGE_DOWN)
time.sleep(SCROLL_PAUSE_TIME * 3)

for i in range(CYCLES):
html.send_keys(Keys.END)
time.sleep(SCROLL_PAUSE_TIME)


comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
all_comments = [elem.text for elem in comment_elems]
#print(all_comments)

replies_elems =driver.find_elements_by_xpath('//*[@id="replies"]')
all_replies = [elem.text for elem in replies_elems]
print(all_replies)

write_file = "output_replies.csv"
with open(write_file, "w") as output:
for line in all_replies:
output.write(line + '\n')

我的实际输出:

['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 39 replies', '', '', 'View 2 replies', '', '', '', 'View reply', '', '', '', '', '', 'View reply', '', '', '', '', '', '', '', '', 'View reply', '', '', 'View reply', '', '', '', '', 'View 43 replies', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 2 replies', '', '', '', '', '', 'View 17 replies', '', '', '', '', 'View 13 replies', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View reply', '', 'View reply', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 5 replies', '', '', '', '', '', 'View reply', '', 'View 28 replies', '', '', 'View 27 replies', '', '', 'View reply', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View reply', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 9 replies', 'View reply', '', '', '', 'View reply', '', 'View 13 replies', '', '', '', 'View reply', 'View 9 replies', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 11 replies', '', '', '', '', 'View 2 replies', '', '', '', '', '', 'View reply', '', '', '', '', '', '', 'View reply', '', '', '', '', '', '', '', 'View reply', '', '', '', 'View 2 replies', '', '', '', '']

我获取回复内容消息的预期输出。但我只能得到回复数。

最佳答案

您需要点击“查看重播”才能抓取评论回复。

要点击它,您可以执行以下操作:

driver.find_elements_by_xpath("//ytd-button-renderer[@id='more-replies']/a/paper-button[@id="button"]").click()

然后是抓取回复

driver.find_elements_by_xpath("//div[@id='loaded-replies']/ytd-comment-renderer//yt-formatted-string[@id='content-text']") 

关于python - 使用 python 中的 selenium 抓取 youtube 上的所有评论及其回复,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56378004/

34 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com