Selenium Pagination Error when scraping clutch.co(擦除clutch.co时的Selify分页错误)-6ren

Selenium Pagination Error when scraping clutch.co(擦除clutch.co时的Selify分页错误)

转载作者：bug小助手更新时间：2023-10-24 21:20:09

For some reason this clutch.co scraper isn't clicking the "next" button and navigating to the next page. So when I run this code it'll only get information from the first page and then close itself.

由于某种原因，这个Clutch.co Screper没有点击“下一步”按钮并导航到下一页。因此，当我运行这段代码时，它只会从第一页获取信息，然后自动关闭。

I added in waits to allow the page to load but it hasn't helped. When watching the browser you can see it scrolls to the bottom of the page but then closes itself.

我添加了等待，以允许页面加载，但它并没有帮助。在观看浏览器时，您可以看到它滚动到页面底部，但随后自动关闭。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import time

website = "https://clutch.co/us/web-developers"
options = webdriver.ChromeOptions()
options.add_experimental_option("detach", False)

driver = webdriver.Chrome(options=options)
driver.get(website)

wait = WebDriverWait(driver, 10)
company_elements = wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'provider-info')))

#pagination
pagination = driver.find_element(By.XPATH,'//ul[@class="pagination justify-content-center"]')
pages = pagination.find_elements(By.TAG_NAME,'li')
last_page = int(250)



company_names = []
taglines = []
locations = []
costs = []
ratings = []

current_page = 1


while current_page <= last_page:
    company_elements = wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'provider-info')))

    for company_element in company_elements:
        company_name = company_element.find_element(By.CLASS_NAME, "company_info").text
        company_names.append(company_name)


        tagline = company_element.find_element(By.XPATH,'.//p[@class="company_info__wrap tagline"]').text
        taglines.append(tagline)

        rating = company_element.find_element(By.XPATH,'.//span[@class="rating sg-rating__number"]').text
        ratings.append(rating)

        location = company_element.find_element(By.XPATH, './/span[@class="locality"]').text
        locations.append(location)

        cost = company_element.find_element(By.XPATH, './/div[@class="list-item block_tag custom_popover"]').text
        costs.append(cost)

    current_page = current_page + 1

    try:
        next_page = driver.find_element(By.XPATH,'//li[@class="page-item next"]/a[@class="page-link"]")')
        next_page.click()
        time.sleep(10)
    except:
        break

driver.close()



data = {'Company_Name': company_names, 'Tagline': taglines, 'location': locations, 'Ticket_Price': costs, 'Rating': ratings}
df = pd.DataFrame(data)
df.to_csv('companies_test1.csv', index=False)
print(df)

更多回答

优秀答案推荐

Your XPath is wrong, use:

您的XPath错误，请使用：

next_page = driver.find_element(By.XPATH,'//li[@class="page-item next"]/a[@class="page-link"]')

But the website block it. If you remove the try/catch, you can read error:

但网站屏蔽了它。如果删除TRY/CATCH，则会显示错误：

selenium.common.exceptions.ElementClickInterceptedException:
Message: element click intercepted: Element
<a class="page-link" data-page="1" href="/us/web-developers?pag e=1" data-link="?page=1">...</a>
is not clickable at point (622, 888).
Other element would receive the click: 
<div id="CybotCookiebotDialogBodyButtons" style="padding-left: 0px;">...</div>

A better code, but my IP/settings require Cloudfare captcha:

一个更好的代码，但我的IP/设置需要云费用验证码：

next_page = driver.find_element(By.XPATH,'//li[@class="page-item next"]/a[@class="page-link"]')
np = next_page.get_attribute('href')
driver.get(np)
time.sleep(6)

更多回答

Yeah it works thanks! I also get the Cloudfare captcha I'll find a work around

是啊，很管用，谢谢！我也买了云彩验证码，我会在附近找工作的

The way to thanks here is to vote up/accept the answer. Can you tell me what is the workaround? If you don't want it to be public, you can find my email on the page linked in my profile.

在这里表达感谢的方式是投票支持/接受答案。您能告诉我解决办法是什么吗？如果你不想公开，你可以在我个人资料中链接的页面上找到我的电子邮件。

Selenium Webdriver not working in AWS EC2 Ubuntu(Selify WebDriver在AWS EC2 Ubuntu中不起作用)
我只在服务器上得到上述错误，它在本地运行得很好。我的本地代码是。我在服务器中的“驱动程序”步骤中得到错误。任何解决方案？？
selenium webdriver chrome 115 stopped working(Selify WebDriver Chrome 115已停止工作)
我在Windows上安装了Chrome 115.0.5790.99，我使用的是Selens4.10.0。在我的Python代码中，我调用SERVICE=SERVICE(ChromeDriverMana
I want to click on the next element in Python Selenium. I can't click because of shadow dom. What should I do?(我想要单击的下一个元素是Python Selify。因为影迷，我不能点击。我该怎么办？)
I want to click on the next element in Python Selenium.I can't click because of shadow dom. What

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Selenium Pagination Error when scraping clutch.co(擦除clutch.co时的Selify分页错误)