gpt4 book ai didi

python - 我正在尝试从需要登录的网站中抓取 HTML,但没有获取任何数据

转载 作者:太空宇宙 更新时间:2023-11-03 16:36:19 25 4
gpt4 key购买 nike

I am following this tutorial但当我运行 python 时,我似乎无法获取任何数据。我收到 HTTP 状态代码 200,并且 status.ok 返回 true 值。任何帮助都会很棒。这就是我在终端中的响应:

[]

200

True

import requests
from lxml import html

USERNAME = "username@email.com"
PASSWORD = "legitpassword"

LOGIN_URL = "https://bitbucket.org/account/signin/?next=/"
URL = "https://bitbucket.org/dashboard/overview"

def main():
session_requests = requests.session()

# Get login csrf token
result = session_requests.get(LOGIN_URL)
tree = html.fromstring(result.text)
authenticity_token = list(set(tree.xpath("//input[@name='csrfmiddlewaretoken']/@value")))[0]

# Create payload
payload = {
"username": USERNAME,
"password": PASSWORD,
"csrfmiddlewaretoken": authenticity_token
}

# Perform login
result = session_requests.post(LOGIN_URL, data = payload, headers = dict(referer = LOGIN_URL))

# Scrape url
result = session_requests.get(URL, headers = dict(referer = URL))
tree = html.fromstring(result.content)
bucket_elems = tree.findall(".//span[@class='repo-name']")
bucket_names = [bucket_elem.text_content().replace("\n", "").strip() for bucket_elem in bucket_elems]

print bucket_names
print result.status_code

if __name__ == '__main__':
main()

最佳答案

xpath 错误,类存储库名称没有跨度,您可以使用以下方式从 anchor 标记获取存储库名称:

bucket_elems = tree.xpath("//a[@class='execute repo-list--repo-name']")
bucket_names = [bucket_elem.text_content().strip() for bucket_elem in bucket_elems]

自教程编写以来,html 发生了明显的变化。

关于python - 我正在尝试从需要登录的网站中抓取 HTML,但没有获取任何数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37173666/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com