gpt4 book ai didi

python - DuckDuckGo 在使用 Python 请求时返回 418

转载 作者:行者123 更新时间:2023-12-04 16:39:50 28 4
gpt4 key购买 nike

我正在编写一个脚本,该脚本打开 firefox,其中包含它为给定术语找到的第一个 duckduckgo 结果。
我知道。它非常有用。

但是当从我的浏览器复制一个 url 并用 python 请求它时:

url = "https://duckduckgo.com/?t=ffab&q=python+request+duckduckgo&ia=software"
req = r.get(url)

Duckduckgo 返回 418。

发生了什么事?
duckduckgo 是否认识到我正在执行自动请求并决定变成茶壶?
如果是这样,我该如何避免呢?

我也知道有一个用于 python 的 duckduckgo api,但我正在做这个项目以开始使用 requestsbeautifulsoup

最佳答案

你需要添加一个'user-agent' header,即使是像这样简单的一个:

req = r.get(url, headers={'user-agent': 'my-app/0.0.1'})

更新:具有合理命名变量的完整代码

import requests

url = "https://duckduckgo.com/?t=ffab&q=python+request+duckduckgo&ia=software"
response = requests.get(url, headers={'user-agent': 'my-app/0.0.1'})
response.raise_for_status() # throw an exception if not a 200 return code
# or test response.status_code if you do not want to throw an exception
data = response.text # this is the HTML assuming that is what the URL returns
print(data)

打印:

<!DOCTYPE html><html lang="en_US" class="no-js has-zcm  no-theme "><head><meta http-equiv="content-type" content="text/html; charset=utf-8"><title>python request duckduckgo at DuckDuckGo</title><link rel="stylesheet" href="/s1909.css" type="text/css"><link rel="stylesheet" href="/r1909.css" type="text/css"><meta name="robots" content="noindex,nofollow"><meta name="referrer" content="origin"><meta name="apple-mobile-web-app-title" content="python request duckduckgo"><link rel="preconnect" href="https://links.duckduckgo.com"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link id="icon60" rel="apple-touch-icon" href="/assets/icons/meta/DDG-iOS-icon_60x60.png?v=2"/><link id="icon76" rel="apple-touch-icon" sizes="76x76" href="/assets/icons/meta/DDG-iOS-icon_76x76.png?v=2"/><link id="icon120" rel="apple-touch-icon" sizes="120x120" href="/assets/icons/meta/DDG-iOS-icon_120x120.png?v=2"/><link id="icon152" rel="apple-touch-icon" sizes="152x152" href="/assets/icons/meta/DDG-iOS-icon_152x152.png?v=2"/><link rel="image_src" href="/assets/icons/meta/DDG-icon_256x256.png"/><script type="text/javascript">var ct,fd,fq,it,iqa,iqm,iqs,iqp,iqq,qw,dl,ra,rv,rad,r1hc,r1c,r2c,r3c,rfq,rq,rds,rs,rt,rl,y,y1,ti,tig,iqd,locale,settings_js_version='s2475.js',is_twitter='',rpl=1;fq=0;fd=1;it=0;iqa=0;iqbi=0;iqm=0;iqs=0;iqp=0;iqq=0;qw=3;dl='en';ct='US';iqd=0;r1hc=0;r1c=0;r3c=0;rq='python%20request%20duckduckgo';rqd="python request duckduckgo";rfq=0;rt='';ra='ffab';rv='';rad='';rds=30;rs=0;spice_version='2000';spice_paths='{}';locale='en_US';settings_url_params={};rl='us-en';rlo=0;df='';ds='';sfq='';iar='';vqd='3-149609696422854606330346289888770817762-151254838983446808561626137548835915940';safe_ddg=0;show_covid=0;</script><meta name="viewport" content="width=device-width, initial-scale=1" /><meta name="HandheldFriendly" content="true" /><meta name="apple-mobile-web-app-capable" content="no" /></head><body class="body--serp"><input id="state_hidden" name="state_hidden" type="text" size="1"><span class="hide">Ignore this box please.</span><div id="spacing_hidden_wrapper"><div id="spacing_hidden"></div></div><script type="text/javascript" src="/lib/l118.js"></script><script type="text/javascript" src="/locale/en_US/duckduckgo14.js"></script><script type="text/javascript" src="/util/u469.js"></script><script type="text/javascript" src="/d2827.js"></script><div class="site-wrapper  js-site-wrapper"><div class="welcome-wrap js-welcome-wrap"></div><div id="header_wrapper" class="header-wrap js-header-wrap"><div id="header" class="header  cw"><div class="header__search-wrap"><a tabindex="-1" href="/?t=ffab" class="header__logo-wrap js-header-logo"><span class="header__logo js-logo-ddg">DuckDuckGo</span></a><div class="header__content  header__search"><form id="search_form" class="search--adv  search--header  js-search-form" name="x" action="/"><input type="text" name="q" tabindex="1" autocomplete="off" id="search_form_input" class="search__input search__input--adv js-search-input" value="python request duckduckgo"><input id="search_form_input_clear" class="search__clear  js-search-clear" type="button" tabindex="3" value="X"/><input id="search_button" class="search__button  js-search-button" type="submit" tabindex="2" value="S" /><a id="search_dropdown" class="search__dropdown" href="javascript:;" tabindex="4"></a><div id="search_elements_hidden" class="search__hidden  js-search-hidden"></div></form></div></div><div id="duckbar" class="zcm-wrap  zcm-wrap--header  is-noscript-hidden"></div></div><div class="header--aside js-header-aside"></div></div><div id="zero_click_wrapper" class="zci-wrap"></div><div id="vertical_wrapper" class="verticals"></div><div id="web_content_wrapper" class="content-wrap "><div class="serp__top-right  js-serp-top-right"></div><div class="serp__bottom-right  js-serp-bottom-right"><div class="js-feedback-btn-wrap"></div></div><div class="cw"><div id="links_wrapper" class="serp__results js-serp-results"><div class="results--main"><div class="search-filters-wrap"><div class="js-search-filters search-filters"></div></div><noscript><meta http-equiv="refresh" content="0;URL=/html?q=python%20request%20duckduckgo"><link href="/css/noscript.css" rel="stylesheet" type="text/css"><div class="msg msg--noscript"><p class="msg-title--noscript">You are being redirected to the non-JavaScript site.</p>Click <a href="/html/?q=python%20request%20duckduckgo">here</a> if it doesn't happen automatically.</div></noscript><div id="message" class="results--message"></div><div class="ia-modules js-ia-modules"></div><div id="ads" class="results--ads results--ads--main is-invisible js-results-ads"></div><div id="links" class="results is-invisible js-results"></div></div><div class="results--sidebar js-results-sidebar"><div class="sidebar-modules js-sidebar-modules"></div><div class="is-invisible js-sidebar-ads"></div></div></div></div></div><div id="bottom_spacing2"> </div></div><script type="text/javascript"></script><script type="text/JavaScript">function nrji() {nrj('/t.js?q=python%20request%20duckduckgo&l=us-en&s=0&dl=en&ct=US&ss_mkt=us&p_ent=&ex=-1');nrj('/d.js?q=python%20request%20duckduckgo&l=us-en&s=0&a=ffab&dl=en&ct=US&ss_mkt=us&vqd=3-149609696422854606330346289888770817762-151254838983446808561626137548835915940&p_ent=&ex=-1&sp=1');;};DDG.ready(nrji, 1);</script><script src="/g2379.js"></script><script type="text/javascript">DDG.page = new DDG.Pages.SERP({ showSafeSearch: 0, instantAnswerAds: false });</script><div id="z2"> </div><div id="z"></div></body></html>

您必须了解 HTML 可能包含在页面加载后执行的 JavaScript,它会修改页面内容。因此,您在浏览器中看到的内容可能与您在通过 requests 加载的 HTML 中看到的内容不一致。如果是这种情况,您可能需要一个不同的工具,例如 selenium 来驱动实际的网络浏览器。

关于python - DuckDuckGo 在使用 Python 请求时返回 418,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63058873/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com