gpt4 book ai didi

python - BeautifulSoup.findAll 不打印任何内容

转载 作者:太空宇宙 更新时间:2023-11-03 21:24:26 26 4
gpt4 key购买 nike

我检查了登录响应是200,但结果什么也没打印。这是代码:

import requests
from bs4 import BeautifulSoup

file_in = 'D:\OneDrive\Documents\GPIP\Files\scraping\idlinkedin.csv'
dataset = open(file_in, "r")

def login(iemail,ipassword):
client = requests.Session()

HOMEPAGE_URL = 'https://www.linkedin.com'
LOGIN_URL = 'https://www.linkedin.com/uas/login-submit'

html = client.get(HOMEPAGE_URL).content
soup = BeautifulSoup(html, "html.parser")
csrf = soup.find(id="loginCsrfParam-login")['value']

login_information = {
'session_key': iemail,
'session_password': ipassword,
'loginCsrfParam': csrf,
}

client.post(LOGIN_URL, data=login_information)

for username in dataset:
item_url = 'https://www.linkedin.com/in/' + username.strip()
source_code = client.get(item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, features='html.parser')
for item_name in soup.findAll('h1', {'class': 'pv-top-card-section__name inline t-24 t-black t-normal'}):
print(item_name)

# MAIN
login('theusername','thepassword')

这行代码应该打印该名称的帐户,不幸的是,结果是什么都没有。

for item_name in soup.findAll('h1', {'class': 'pv-top-card-section__name inline t-24 t-black t-normal'}):
print(item_name)

最佳答案

这里的问题是,您正在根据您在浏览器元素中查看的内容(出现在 Chrome 浏览器中的 F12 上)编写代码,而不是根据您在 requests.get 函数中获得的响应来编写代码。当我使用 instagram.com 进行抓取时,我遇到了同样的问题-->(请参阅 git hub https://github.com/simplyshravan/python_learning/blob/master/Using_beautifulsoup.py 上的此链接)。

始终关注您收到的内容,而不是它的外观。因此,花了几个小时后,下面是从 linkedin 中提取用户信息的代码。

import requests
from bs4 import BeautifulSoup
import json

file_in = r'D:\OneDrive\Documents\GPIP\Files\scraping\idlinkedin.csv'
dataset = open(file_in, "r")

def login(iemail,ipassword):
client = requests.Session()

HOMEPAGE_URL = 'https://www.linkedin.com'
LOGIN_URL = 'https://www.linkedin.com/uas/login-submit'

html = client.get(HOMEPAGE_URL).content
soup = BeautifulSoup(html, "html.parser")
csrf = soup.find(id="loginCsrfParam-login")['value']

login_information = {
'session_key': iemail,
'session_password': ipassword,
'loginCsrfParam': csrf,
}

client.post(LOGIN_URL, data=login_information)

for username in dataset:
item_url = 'https://www.linkedin.com/in/' + username.strip()
print(item_url)
source_code = client.get(item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, 'html.parser')
for item_name in soup.find_all('code'):
if str(item_name).find('firstName') > -1:
for i in json.loads(item_name.text)['included']:
#print(i)
if len(i['$deletedFields']) > 0:
if i['$type']=='com.linkedin.voyager.identity.shared.MiniProfile':
if i["publicIdentifier"]==username.strip():
print(i['firstName']+' '+i['lastName'])
print(i['lastName'])
print(i['occupation'])
break

# MAIN
login('username','password')

关于python - BeautifulSoup.findAll 不打印任何内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53956650/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com