gpt4 book ai didi

python - 如何在Python中编写selenium循环?

转载 作者:行者123 更新时间:2023-11-30 22:32:53 25 4
gpt4 key购买 nike

我想从许多包含 javascript 代码的不同网站上抓取数据(这就是为什么我使用 selenium 方法来获取信息)。一切都工作得很好,但是当我尝试加载下一个 URL 时,我收到一条很长的错误消息:

> Traceback (most recent call last):
File "C:/Python27/air17.py", line 46, in <module>
scrape(urls)
File "C:/Python27/air17.py", line 28, in scrape
browser.get(url)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 268, in get
self.execute(Command.GET, {'url': url})
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 254, in execute
response = self.command_executor.execute(driver_command, params)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 464, in execute
return self._request(command_info[0], url, body=data)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 487, in _request
self._conn.request(method, parsed_url.path, body, headers)
File "C:\Python27\lib\httplib.py", line 1042, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 1082, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 1038, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 882, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 844, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 821, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 575, in create_connection
raise err
error: [Errno 10061]

第一个网站的数据位于 csv 文件中,但是当代码尝试打开下一个网站时,它会卡住,并且我收到此错误消息。我做错了什么?

from bs4 import BeautifulSoup
from selenium import webdriver
import time
import urllib2
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
from bs4 import BeautifulSoup
import MySQLdb
import re
import contextlib
import selenium.webdriver.support.ui as ui

filename=r'output.csv'

resultcsv=open(filename,"wb")
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')
output.writerow(['TIME','FLIGHT','FROM','AIRLANE','AIRCRAFT','IHAVETODELETETHIS','STATUS'])


def scrape(urls):
browser = webdriver.Firefox()
for url in urls:
browser.get(url)
html = browser.page_source
soup=BeautifulSoup(html,"html.parser")
table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
datatable=[]
for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
temp_data = []
for data in record.find_all("td"):
temp_data.append(data.text.encode('latin-1'))
datatable.append(temp_data)

output.writerows(datatable)

resultcsv.close()
time.sleep(10)
browser.quit()

urls = ["https://www.flightradar24.com/data/airports/bud/arrivals", "https://www.flightradar24.com/data/airports/fco/arrivals"]
scrape(urls)

最佳答案

不确定方法末尾的 browser.quit() 是个好主意。根据Selenium doc :

quit()

Quits the driver and close every associated window.

我认为在循环中使用 browser.close()( as documented here ) 就足够了。将 browser.quit() 保持在循环之外。

关于python - 如何在Python中编写selenium循环?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45323400/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com