gpt4 book ai didi

python - 如何使用 Selenium 定期从网站获取记录?

转载 作者:行者123 更新时间:2023-12-04 09:00:02 28 4
gpt4 key购买 nike

我有一个从网站获取公司数据的小脚本。该网站会定期更新公司的新信息。如何更新我的 csv定期有新记录?此外,正如您在代码中看到的,我为页面使用了明确的范围,还有哪些其他解决方案?
以下是代码——

from selenium.webdriver import Firefox
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from time import sleep
import csv


#navigate to the ystory companies page

#start collecting data from ystory

START_URL = 'https://yourstory.com/companies/search?page=1&hitsPerPage=30'

#when the collection populates 30 elements then click on next page


class CompDeetz():

def __init__(self):
self.browser = Firefox()
self.browser.get(START_URL)
sleep(20)
self.browser.find_element_by_xpath('/html/body/div[12]/div/div/button').click()
sleep(5)
self.browser.find_element_by_xpath('/html/body/div[1]/div[4]').click()
self.database = []



def write_row(self,record):

with open('test.csv', 'a') as t:
writer = csv.writer(t)
writer.writerows(record)



def get_everything(self):

all_list = [ (a.text) for a in self.browser.find_elements_by_xpath('//tr[@class="hit"]')]
all_records = []
for company in all_list:

record = company.split('\n')
all_records.append(record)


self.write_row(all_records)



def next_page(self):

self.browser.find_element_by_xpath('//ul[@class="ais-Pagination-list"]/li[7]/a').click()
sleep(20)



def main():
t = CompDeetz()
t.get_everything()
for i in range(33):
t.next_page()
t.get_everything()


if __name__ == "__main__":
main()

最佳答案

而不是有两个不同的方法 get_everything 和 next_page 并多次调用它们。您可以使用一种方法 get_everything 并调用一次。
def get_everything(self):

  all_records = []
nextPage = True
while nextPage:

all_list = [ (a.text) for a in self.browser.find_elements_by_xpath('//tr[@class="hit"]')]
for company in all_list:
record = company.split('\n')
all_records.append(record)

try:
nextPagelink = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[@aria-label='Next page']")))
driver.execute_script("arguments[0].scrollIntoView();", nextPagelink)
driver.execute_script("arguments[0].click();", nextPagelink)
time.sleep(5) # for next [age to load
#As on last page, next page link is not available. It will throw exception
except NoSuchElementException:
nextpage = False


self.write_row(all_records)
注意:注意弹出页面。我希望你已经有了处理它的机制。

关于python - 如何使用 Selenium 定期从网站获取记录?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63599731/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com