gpt4 book ai didi

python代码可以工作(SELENIUM),但数据是重复的

转载 作者:行者123 更新时间:2023-12-01 09:08:37 24 4
gpt4 key购买 nike

我正在学习 Selenium 。我编写了一些 python 代码来从网站上抓取数据。它有效,但它只取出第一个重复链接并生成 csv。它不起作用是因为代码重复了第一个链接中的信息吗?

# -*- coding: utf-8 -*- 
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import csv
# Specifying incognito mode as you launch your browser[OPTIONAL]
option = webdriver.ChromeOptions()
option.add_argument("--incognito")
lista_datos = []
resultado = []
# Create new Instance of Chrome in incognito mode
browser = webdriver.Chrome(executable_path=r'C:\Users\inspiron3420\Downloads\chromedriver.exe', chrome_options=option)

# Go to desired website
browser.get("https://www.biobiochile.cl")

# Wait 20 seconds for page to load
timeout = 20
try:
variable = 'incendio'
boton = browser.find_element_by_xpath("//*[@id='search-anchor']")
boton.click()
buscar = browser.find_element_by_xpath("//*[@id='buscador-bbcl']/div/input")
buscar.send_keys(variable)
accion = browser.find_element_by_xpath("//*[@id='buscador-bbcl']/div/span[2]/button")
accion.click()

except TimeoutException:
print("Timed out waiting for page to load")
browser.quit()

try:

WebDriverWait(browser, 5).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "resultados-texto")))
except:
print ("Elementos no encontrados")
#Obtenemos en una lista los elementos encontrados
resultados = browser.find_elements_by_class_name("resultados-texto")
for resultado in resultados:
titulopre = resultado.find_element_by_xpath("//*[@id='menu-buscador2']/div/div[3]/div/div[3]/div[1]/a/div[2]/div[1]")
titulo = titulopre.text
fechapre = resultado.find_element_by_xpath("//*[@id='menu-buscador2']/div/div[3]/div/div[3]/div[1]/a/div[2]/div[2]")
fecha = fechapre.text
#Finalmente metemos en una lista de listas los datos obtenidos
lista_datos.append([titulo,fecha])
csvsalida = open('scrappingbiobio.csv', 'w', newline='')
salida = csv.writer(csvsalida)
salida.writerow(['titulo', 'fecha'])
salida.writerows(lista_datos)
csvsalida.close()

最佳答案

csvsalida = open('scrappingbiobio.csv', 'w', newline='')

每次都会覆盖 w

你应该使用附加,

csvsalida = open('scrappingbiobio.csv', 'a', newline='')

现在的问题是它每次都会写入这一行 salida.writerow(['titulo', 'fecha']) 。为此,您可以检查文件是否存在

import os

csvsalida = open('scrappingbiobio.csv', 'a', newline='')
salida = csv.writer(csvsalida)
if os.path.isfile('scrappingbiobio.csv') == False:
salida.writerow(['titulo', 'fecha'])
salida.writerows(lista_datos)
csvsalida.close()

关于python代码可以工作(SELENIUM),但数据是重复的,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51843237/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com