python - 从某些电视剧中收集来自 IMDb 的所有电影评论-6ren

python - 从某些电视剧中收集来自 IMDb 的所有电影评论

转载作者：行者123 更新时间：2023-12-04 07:25:11

25

4

我正在尝试使用 python 从 IMDb 收集数据，但我无法获得所有评论。我有以下有效的代码，但不提供所有可用的评论:

from imdb import IMDb

ia = IMDb()

ia.get_movie_reviews('13433812')

输出 :

`{'data': {'reviews': [{'content': 'Just finished watching the episode 4. Wow, it was so good. Well made mixture of thriller and comedy.I saw a few negative reviews here written after eps 1 or 2. I recommend watching at least up to eps 3 and 4. The real story starts from eps 3. Eps 4 is like a complete well made movie. You will surely enjoy it.',
'helpful': 0,
'title': '',
'author': 'ur129930427',
'date': '28 February 2021',
'rating': None,
'not_helpful': 0},


`{'content': 'You can see the cast had a lot of fun making this Italian/Korean would-be mafia thriller, the sort of fun NOT experienced in Hollywood since the days of Burt Reynolds. Vincenzo contains a very absorbing plot, a cast star-struck by designer clothes, interspersed with Italian (and other) Classical music excerpts to set in relief some well written suspense and intrigue. The plot centers on, if we really are to believe it, the endemically CORRUPT upper echelons of S. Korean society. Is it a coincidence that many of the systemic abuses of power and institutional vice that constitute Vincenzo\'s Main Plot are now also going on, this very moment in the USA? It is certainly food for thought. A clear advantage this Korean drama has over mediocre US shows, however is a much softer-handed use of violence, resorting more often to satire to keep the plot moving as opposed to gratuitous savagery now so common in so-called "hit" US shows. So far, so good, Binjenzo!'``

我也尝试过 Scrapy 代码，但没有得到任何评论:

from scrapy.http import TextResponse
import urllib.parse
from urllib.parse import urljoin
base_url = "https://www.imdb.com/title/tt13433812/reviews?ref_=tt_urv"
r=requests.get(base_url)
response = TextResponse(r.url, body=r.text, encoding='utf-8')
reviews = response.xpath('//*[contains(@id,"1")]/p/text()').extract()
len(reviews)
output : 0

最佳答案

这应该会为您提供该页面上的所有审阅者姓名，从而耗尽所有加载更多按钮。随意定义其他字段以根据您的要求获取它们。

import requests
from bs4 import BeautifulSoup

start_url = 'https://www.imdb.com/title/tt13433812/reviews?ref_=tt_urv'
link = 'https://www.imdb.com/title/tt13433812/reviews/_ajax'

params = {
    'ref_': 'undefined',
    'paginationKey': ''
}

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
    res = s.get(start_url)

    while True:
        soup = BeautifulSoup(res.text,"lxml")
        for item in soup.select(".review-container"):
            reviewer_name = item.select_one("span.display-name-link > a").get_text(strip=True)
            print(reviewer_name)


        try:
            pagination_key = soup.select_one(".load-more-data[data-key]").get("data-key")
        except AttributeError:
            break
        params['paginationKey'] = pagination_key
        res = s.get(link,params=params)

关于python - 从某些电视剧中收集来自 IMDb 的所有电影评论，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68243944/

25

4

0

文章推荐： .htaccess - htaccess 替换url中的参数

文章推荐： emacs - 在 emacs23 状态栏中显示 CPU 负载

文章推荐： web-applications - 单击链接时启动本地应用程序

文章推荐： github - 仅在 GitHub 操作中发布更改的项目

Django 评论，将符号附加到 url 评论？
我正在使用评论系统，现在，我想重写 url 评论的片段并附加一个符号#，我想将页面部分移动到评论列表，正好是最后一个评论用户，带有 username 我在发表评论时使用 next 重定向用户: {
android - 请求用户对 Android Market 进行评分/评论/评论
这个问题在这里已经有了答案: "Rate This App"-link in Google Play store app on the phone (21 个回答) 关闭2年前。有没有一种方法可以要
facebook - 通过 Graph API 评论 Facebook 页面评级(评论)
长期潜伏者第一次海报... 我们正在使用 Facebook 的 API 将其集成到我们的网络应用程序中，并且我们能够通过 {page-id}/ratings 部分中的 {open_graph_stor
javascript - 如何让 VS2012 自动格式化 Javascript 评论 block ，如 C# 评论
我正在尝试让 Visual Studio 2012 自动格式化我的评论 block ，就像它对我的 C# block 所做的那样。我希望我的评论看起来像这样: /* * Here is my C#
MySQL 评论
在 MySQl 中创建表时对每个字段进行注释是否会影响性能？我正在处理一个包含 1000 多个表的数据库，几乎每个表中的每个字段都有注释。我只是想知道这是否会以任何方式影响 MySQL 的性能？最佳
Gerrit & Phabricator 评论
关闭。这个问题不满足Stack Overflow guidelines .它目前不接受答案。想改善这个问题吗？更新问题，使其成为 on-topic对于堆栈溢出。 7年前关闭。 Improve thi
mysql - 从应用程序中选择最新的注释/评论
这个问题在这里已经有了答案: SQL select only rows with max value on a column [duplicate] (27 个答案) 关闭 5 年前。我这里有 2
html - 评论 : How to comment -- or -->
如何在评论中正确编写 --> 或 -->？我正在维护一个包含许多小程序代码条目的大型 html 文件。说: a --> b. 我在 HTML 中将其编码为 -->: a --> b. 但是，我
Android -- 如何从应用内向市场发布应用评级/评论？
这是一个简单的问题。有没有办法允许用户直接在我的应用程序中输入评论和/或评级，并将这些数据发回 Android Market？如果是这样，如果我使用 EditText View 允许用户输入，代码会是
java - 注释=评论？
注释是否表示代码中带有//或/* */的注释？最佳答案不，注释不是评论。使用语法 @Annotation 将注释添加到字段、类或方法。最著名的注解之一是@Override，用于表示方法正在覆盖父类
python - Django 评论
我有一个包含两个模型的 Django 应用程序:第一个是 django.contrib.auth.User，第二个是我创建的 Product。我会为每个产品添加评论，因此每个注册用户都可以为每个产品
评论中的 HTML 评论？
有没有办法评论多行......其中已经有一些评论？即 ... Hello world! Multi-line comment end --> 看来连
ruby koans 评论
这个问题在这里已经有了答案: 关闭 10 年前。 Possible Duplicate: obj.nil? vs. obj == nil 现在通过 ruby koans 工作，发现这个评论嵌入在
ruby - .gemrc 评论？
这是一个基本问题 .gemrc 文件中是否允许注释？如果是，你会怎么做？我这里查了没用 docs.rubygems.org/read/chapter/11 最佳答案文档说:The config
css - 如何进行 sass-only 评论
有没有办法在 SASS 中添加 sass-only 注释？你知道，所以输出 .css 文件没有那些注释例如， /* global variables */ $mainColor: #666; /*
perl - 如何搜索包含特定关键字的 Instagram 评论
我想搜索在任何媒体上发布的评论中的任何特定关键字或几个关键字的组合。我的要求是在 API 的帮助下获取包含该关键字的评论。我浏览了 Instagram API 的文档，发现只能通过哈希标签进行搜索，而
php - 如何在页面呈现之前编辑 WordPress 评论？
在 WordPress 中，您可以在页面加载之前执行以下操作来编辑文章的内容: add_filter('the_content', 'edit_content'); function edit_con
tfs - 合并 - checkin 评论
在指示要合并的内容时， checkin 合并的最佳方法是什么？我已经说过 10 个变更集我正在从我的主分支合并到一个发布分支。每一个都包含我在 checkin 主分支时写的详细注释。现在，当我合并时，
facebook - 如何获得Facebook分享，评论，例如youtube视频计数？
我知道如何查询常规网站的社交参与度计数。可以使用Facebook图形浏览器(https://developers.facebook.com/tools/explorer/)或throug api轻松实
php - 如何获得特定的 YouTube 评论？
我正在尝试从 YouTube 视频中获得特定评论。例如，我想从 YouTube 视频的第 34 条评论中获取详细信息。有谁知道在不阅读所有评论列表的情况下我该怎么做？或者，如果没有任何解决方案可以仅

首页

博学

6Ren·AI

商城

python - 从某些电视剧中收集来自 IMDb 的所有电影评论