python - 使用 python 中的 selenium 抓取 youtube 上的所有评论及其回复

转载作者：行者123 更新时间：2023-12-01 07:47:38

35

4

我正在尝试抓取 YouTube 视频评论及其回复、评论喜欢、评论不喜欢、评论计数、回复计数。

首先，我尝试根据 id 在 python 中使用 selenium google drivers 来抓取评论等文本数据及其回复。

我只能抓取页面中可用的评论，而不是其回复。

回复无法实现。

import time
import csv
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

chrome_path = "/Users/Downloads/chromedriver"
page_url = "https://www.youtube.com/watch?v=AJesAlohO6I&t=" 


driver = webdriver.Chrome(executable_path=chrome_path)
driver.get(page_url)
time.sleep(2)  


title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
print(title)


SCROLL_PAUSE_TIME = 2
CYCLES = 100

html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.PAGE_DOWN)  
html.send_keys(Keys.PAGE_DOWN)  
time.sleep(SCROLL_PAUSE_TIME * 3)

for i in range(CYCLES):
    html.send_keys(Keys.END)
    time.sleep(SCROLL_PAUSE_TIME)


comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
all_comments = [elem.text for elem in comment_elems]
print(all_comments)

write_file = "output_testing.csv"
with open(write_file, "w") as output:
    for line in all_comments:
        output.write(line + '\n')

通过上面的代码，我可以只抓取评论。如何在 python 中使用 selenium 抓取这些评论的回复、喜欢、不喜欢、这些评论的日期。

谁能帮我指出我哪里出错了。

更新的代码(空数组)

import time
import csv
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

chrome_path = "/Users/Downloads/chromedriver"
page_url = "https://www.youtube.com/watch?v=qBp1rCz_yQU" 


driver = webdriver.Chrome(executable_path=chrome_path)
driver.get(page_url)
time.sleep(2)  


title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
print(title)


SCROLL_PAUSE_TIME = 2
CYCLES = 100

html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.PAGE_DOWN)  
html.send_keys(Keys.PAGE_DOWN)  
time.sleep(SCROLL_PAUSE_TIME * 3)

for i in range(CYCLES):
    html.send_keys(Keys.END)
    time.sleep(SCROLL_PAUSE_TIME)

driver.find_elements_by_xpath('//div[@id="replies"]/ytd-comment-replies-renderer/ytd-expander/paper-button[@id="more"]')

comment_elems = driver.find_elements_by_xpath('//div[@id="loaded-replies"]//yt-formatted-string[@id="content-text"]')
all_comments = [elem.text for elem in comment_elems]
print(all_comments)

write_file = "output_31may.csv"
with open(write_file, "w") as output:
    for line in all_comments:
        output.write(line + '\n')

我更新的代码:(1-05-2019)

import time
import csv
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

chrome_path = "/Users/Downloads/chromedriver"
page_url = "https://www.youtube.com/watch?v=qBp1rCz_yQU" 


driver = webdriver.Chrome(executable_path=chrome_path)
driver.get(page_url)
time.sleep(2)  


title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
print(title)


SCROLL_PAUSE_TIME = 2
CYCLES = 100

html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.PAGE_DOWN)  
html.send_keys(Keys.PAGE_DOWN)  
time.sleep(SCROLL_PAUSE_TIME * 3)

for i in range(CYCLES):
    html.send_keys(Keys.END)
    time.sleep(SCROLL_PAUSE_TIME)


comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
all_comments = [elem.text for elem in comment_elems]
#print(all_comments)

replies_elems =driver.find_elements_by_xpath('//*[@id="replies"]')
all_replies = [elem.text for elem in replies_elems]
print(all_replies)

write_file = "output_replies.csv"
with open(write_file, "w") as output:
    for line in all_replies:
        output.write(line + '\n')

我的实际输出:

['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 39 replies', '', '', 'View 2 replies', '', '', '', 'View reply', '', '', '', '', '', 'View reply', '', '', '', '', '', '', '', '', 'View reply', '', '', 'View reply', '', '', '', '', 'View 43 replies', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 2 replies', '', '', '', '', '', 'View 17 replies', '', '', '', '', 'View 13 replies', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View reply', '', 'View reply', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 5 replies', '', '', '', '', '', 'View reply', '', 'View 28 replies', '', '', 'View 27 replies', '', '', 'View reply', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View reply', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 9 replies', 'View reply', '', '', '', 'View reply', '', 'View 13 replies', '', '', '', 'View reply', 'View 9 replies', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 11 replies', '', '', '', '', 'View 2 replies', '', '', '', '', '', 'View reply', '', '', '', '', '', '', 'View reply', '', '', '', '', '', '', '', 'View reply', '', '', '', 'View 2 replies', '', '', '', '']

我获取回复内容消息的预期输出。但我只能得到回复数。

最佳答案

您需要点击“查看重播”才能抓取评论回复。

要点击它，您可以执行以下操作:

driver.find_elements_by_xpath("//ytd-button-renderer[@id='more-replies']/a/paper-button[@id="button"]").click()

然后是抓取回复

driver.find_elements_by_xpath("//div[@id='loaded-replies']/ytd-comment-renderer//yt-formatted-string[@id='content-text']")

关于python - 使用 python 中的 selenium 抓取 youtube 上的所有评论及其回复，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56378004/

35

4

0

文章推荐： java - Java 中的 HashMap 和 HashSet size()

文章推荐： jquery - 如何使用 show() 让只出现一个包含子元素

文章推荐： nginx - 带有 nginx 上传模块和分块上传的空白 POST

文章推荐： java - Apache 兴趣点。 style.setBackgroundColor 不起作用

Django 评论，将符号附加到 url 评论？
我正在使用评论系统，现在，我想重写 url 评论的片段并附加一个符号#，我想将页面部分移动到评论列表，正好是最后一个评论用户，带有 username 我在发表评论时使用 next 重定向用户: {
android - 请求用户对 Android Market 进行评分/评论/评论
这个问题在这里已经有了答案: "Rate This App"-link in Google Play store app on the phone (21 个回答) 关闭2年前。有没有一种方法可以要
facebook - 通过 Graph API 评论 Facebook 页面评级(评论)
长期潜伏者第一次海报... 我们正在使用 Facebook 的 API 将其集成到我们的网络应用程序中，并且我们能够通过 {page-id}/ratings 部分中的 {open_graph_stor
javascript - 如何让 VS2012 自动格式化 Javascript 评论 block ，如 C# 评论
我正在尝试让 Visual Studio 2012 自动格式化我的评论 block ，就像它对我的 C# block 所做的那样。我希望我的评论看起来像这样: /* * Here is my C#
MySQL 评论
在 MySQl 中创建表时对每个字段进行注释是否会影响性能？我正在处理一个包含 1000 多个表的数据库，几乎每个表中的每个字段都有注释。我只是想知道这是否会以任何方式影响 MySQL 的性能？最佳
Gerrit & Phabricator 评论
关闭。这个问题不满足Stack Overflow guidelines .它目前不接受答案。想改善这个问题吗？更新问题，使其成为 on-topic对于堆栈溢出。 7年前关闭。 Improve thi
mysql - 从应用程序中选择最新的注释/评论
这个问题在这里已经有了答案: SQL select only rows with max value on a column [duplicate] (27 个答案) 关闭 5 年前。我这里有 2
html - 评论 : How to comment -- or -->
如何在评论中正确编写 --> 或 -->？我正在维护一个包含许多小程序代码条目的大型 html 文件。说: a --> b. 我在 HTML 中将其编码为 -->: a --> b. 但是，我
Android -- 如何从应用内向市场发布应用评级/评论？
这是一个简单的问题。有没有办法允许用户直接在我的应用程序中输入评论和/或评级，并将这些数据发回 Android Market？如果是这样，如果我使用 EditText View 允许用户输入，代码会是
java - 注释=评论？
注释是否表示代码中带有//或/* */的注释？最佳答案不，注释不是评论。使用语法 @Annotation 将注释添加到字段、类或方法。最著名的注解之一是@Override，用于表示方法正在覆盖父类
python - Django 评论
我有一个包含两个模型的 Django 应用程序:第一个是 django.contrib.auth.User，第二个是我创建的 Product。我会为每个产品添加评论，因此每个注册用户都可以为每个产品
评论中的 HTML 评论？
有没有办法评论多行......其中已经有一些评论？即 ... Hello world! Multi-line comment end --> 看来连
ruby koans 评论
这个问题在这里已经有了答案: 关闭 10 年前。 Possible Duplicate: obj.nil? vs. obj == nil 现在通过 ruby koans 工作，发现这个评论嵌入在
ruby - .gemrc 评论？
这是一个基本问题 .gemrc 文件中是否允许注释？如果是，你会怎么做？我这里查了没用 docs.rubygems.org/read/chapter/11 最佳答案文档说:The config
css - 如何进行 sass-only 评论
有没有办法在 SASS 中添加 sass-only 注释？你知道，所以输出 .css 文件没有那些注释例如， /* global variables */ $mainColor: #666; /*
perl - 如何搜索包含特定关键字的 Instagram 评论
我想搜索在任何媒体上发布的评论中的任何特定关键字或几个关键字的组合。我的要求是在 API 的帮助下获取包含该关键字的评论。我浏览了 Instagram API 的文档，发现只能通过哈希标签进行搜索，而
php - 如何在页面呈现之前编辑 WordPress 评论？
在 WordPress 中，您可以在页面加载之前执行以下操作来编辑文章的内容: add_filter('the_content', 'edit_content'); function edit_con
tfs - 合并 - checkin 评论
在指示要合并的内容时， checkin 合并的最佳方法是什么？我已经说过 10 个变更集我正在从我的主分支合并到一个发布分支。每一个都包含我在 checkin 主分支时写的详细注释。现在，当我合并时，
facebook - 如何获得Facebook分享，评论，例如youtube视频计数？
我知道如何查询常规网站的社交参与度计数。可以使用Facebook图形浏览器(https://developers.facebook.com/tools/explorer/)或throug api轻松实
php - 如何获得特定的 YouTube 评论？
我正在尝试从 YouTube 视频中获得特定评论。例如，我想从 YouTube 视频的第 34 条评论中获取详细信息。有谁知道在不阅读所有评论列表的情况下我该怎么做？或者，如果没有任何解决方案可以仅

首页

博学

6Ren·AI

商城

python - 使用 python 中的 selenium 抓取 youtube 上的所有评论及其回复

我更新的代码:(1-05-2019)