gpt4 book ai didi

python - Selenium 请求的 HTTP header 中缺少 Referer

转载 作者:太空狗 更新时间:2023-10-29 19:26:28 29 4
gpt4 key购买 nike

我正在用 Selenium 编写一些测试并注意到 header 中缺少 Referer。我编写了以下最小示例来使用 https://httpbin.org/headers 进行测试:

import selenium.webdriver

options = selenium.webdriver.FirefoxOptions()
options.add_argument('--headless')

profile = selenium.webdriver.FirefoxProfile()
profile.set_preference('devtools.jsonview.enabled', False)

driver = selenium.webdriver.Firefox(firefox_options=options, firefox_profile=profile)
wait = selenium.webdriver.support.ui.WebDriverWait(driver, 10)

driver.get('http://www.python.org')
assert 'Python' in driver.title

url = 'https://httpbin.org/headers'
driver.execute_script('window.location.href = "{}";'.format(url))
wait.until(lambda driver: driver.current_url == url)
print(driver.page_source)

driver.close()

打印:

<html><head><link rel="alternate stylesheet" type="text/css" href="resource://content-accessible/plaintext.css" title="Wrap Long Lines"></head><body><pre>{
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "close",
"Host": "httpbin.org",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0"
}
}
</pre></body></html>

所以没有Referer。但是,如果我浏览到任何页面并手动执行

window.location.href = "https://httpbin.org/headers"

在 Firefox 控制台中,Referer 确实按预期显示。


正如下面评论中指出的,使用时

driver.get("javascript: window.location.href = '{}'".format(url))

代替

driver.execute_script("window.location.href = '{}';".format(url))

请求确实包含Referer。此外,当使用 Chrome 而不是 Firefox 时,这两种方法都包含 Referer

所以主要问题仍然存在:为什么如上所述使用 Firefox 发送时请求中缺少 Referer

最佳答案

Referer根据 MDN 文档

The Referer request header contains the address of the previous web page from which a link to the currently requested page was followed. The Referer header allows servers to identify where people are visiting them from and may use that data for analytics, logging, or optimized caching, for example.

Important: Although this header has many innocent uses it can have undesirable consequences for user security and privacy.

来源:https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer


但是:

A Referer header is not sent by browsers if:

  • The referring resource is a local "file" or "data" URI.
  • An unsecured HTTP request is used and the referring page was received with a secure protocol (HTTPS).

来源:https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer


隐私和安全问题

Referer 存在一些隐私和安全风险HTTP header :

The Referer header contains the address of the previous web page from which a link to the currently requested page was followed, which can be further used for analytics, logging, or optimized caching.

来源:https://developer.mozilla.org/en-US/docs/Web/Security/Referer_header:_privacy_and_security_concerns#The_referrer_problem


解决安全问题

来自Referer标题透视大多数安全风险可以按照以下步骤减轻:

  • Referrer-Policy: Using the Referrer-Policy header on your server to control what information is sent through the Referer header. Again, a directive of no-referrer would omit the Referer header entirely.
  • The referrerpolicy attribute on HTML elements that are in danger of leaking such information (such as <img> and <a>). This can for example be set to no-referrer to stop the Referer header being sent altogether.
  • The rel attribute set to noreferrer on HTML elements that are in danger of leaking such information (such as <img> and <a>).
  • The Exit Page Redirect technique: This is the only method that should work at the moment without flaw is to have an exit page that you don’t mind having inside of the referer header. Many websites implement this method, including Google and Facebook. Instead of having the referrer data show private information, it only shows the website that the user came from, if implemented correctly. Instead of the referrer data appearing as http://example.com/user/foobar the new referrer data will appear as http://example.com/exit?url=http%3A%2F%2Fexample.com. The way the method works is by having all external links on your website go to a intermediary page that then redirects to the final page. Below we have a link to the website example.com and we URL encode the full URL and add it to the url parameter of our exit page.

来源:


这个用例

我已经通过 GeckoDriver/Firefox 和 ChromeDriver/Chrome 组合执行了您的代码:

代码块:

driver.get('http://www.python.org')
assert 'Python' in driver.title

url = 'https://httpbin.org/headers'
driver.execute_script('window.location.href = "{}";'.format(url))
WebDriverWait(driver, 10).until(lambda driver: driver.current_url == url)
print(driver.page_source)

观察:

  • 使用 GeckoDriver/Firefox Referer: "https://www.python.org/" header 缺失如下:

        {
    "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "en-US,en;q=0.5",
    "Host": "httpbin.org",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0"
    }
    }
  • 使用 ChromeDriver/Chrome Referer: "https://www.python.org/" header 存在如下:

        {
    "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "en-US,en;q=0.9",
    "Host": "httpbin.org",
    "Referer": "https://www.python.org/",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36"
    }
    }

结论:

这似乎是 GeckoDriver/Firefox 在处理 Referer 时的一个问题标题。


结尾

Referrer Policy

关于python - Selenium 请求的 HTTP header 中缺少 Referer,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54119674/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com