gpt4 book ai didi

python - 网页抓取 python 错误(NameError : name 'reload' is not defined)

转载 作者:行者123 更新时间:2023-12-01 00:36:17 25 4
gpt4 key购买 nike

尝试使用 python 进行一些网页抓取并收到错误。

我不确定这个引用错误意味着什么,我在Python3中运行它,有人可以帮忙吗?

回溯(最近一次调用最后一次): 文件“/home/l/gDrive/AudioBookReviews/WebScraping/GoodreadsScraper.py”,第 3 行,位于 重新加载(系统)NameError:名称“重新加载”未定义

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf8')
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.firefox.options import Options
#from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common import keys
import csv
import time
import json

class Book:
def __init__(self, title, url):
self.title = title
self.url = url
def __iter__(self):
return iter([self.title, self.url])

url = 'https://www.goodreads.com/'

def create_csv_file():
header = ['Title', 'URL']
with open('/home/l/Downloads/WebScraping/GoodReadsBooksNew.csv', 'w+') as csv_file:
wr = csv.writer(csv_file, delimiter=',')
wr.writerow(header)

def read_from_txt_file():
lines = [line.rstrip('\n') for line in open('/home/l/Downloads/WebScraping/BookTitles.txt')]
return lines

def init_selenium():
options = Options()
options.add_argument('--headless')
global driver
driver = webdriver.Chrome("/home/l/Downloads/WebScraping/chromedriver")
driver.get(url)
time.sleep(30)
driver.get('https://www.goodreads.com/search?q=')

def search_for_title(title):
search_field = driver.find_element_by_xpath('//*[@id="search_query_main"]')
search_field.clear()
search_field.send_keys(title)
search_button = driver.find_element_by_xpath('/html/body/div[2]/div[3]/div[1]/div[1]/div[2]/form/div[1]/input[3]')
search_button.click()

def scrape_url():
try:
url = driver.find_element_by_css_selector('a.bookTitle').get_attribute('href')
except:
url = "N/A"

return url

def write_into_csv_file(vendor):
with open('/home/l/Downloads/WebScraping/GoodReadsBooksNew.csv', 'a') as csv_file:
wr = csv.writer(csv_file, delimiter=',')
wr.writerow(list(vendor))

create_csv_file()
titles = read_from_txt_file()
init_selenium()

for title in titles:
search_for_title(title)
url = scrape_url()
book = Book(title, url)
write_into_csv_file(book)

最佳答案

Python3 不再支持重新加载

您应该删除这些行

reload(sys)
sys.setdefaultencoding('utf8')

相反,在 Python3.x 中打开文件时,您应该传递 encoding='utf-8' 作为参数

第 29 行:

with open('/home/l/Downloads/WebScraping/GoodReadsBooksNew.csv', 'w+') as csv_file:

更改为

with open('/home/l/Downloads/WebScraping/GoodReadsBooksNew.csv', 'w+', encoding='utf-8') as csv_file:

第 34 行:

lines = [line.rstrip('\n') for line in open('/home/l/Downloads/WebScraping/BookTitles.txt')]

更改为

lines = [line.rstrip('\n') for line in open('/home/l/Downloads/WebScraping/BookTitles.txt', encoding='utf-8')]

第 62 行:

with open('/home/l/Downloads/WebScraping/GoodReadsBooksNew.csv', 'a') as csv_file:

更改为

with open('/home/l/Downloads/WebScraping/GoodReadsBooksNew.csv', 'a', encoding='utf-8') as csv_file:

关于python - 网页抓取 python 错误(NameError : name 'reload' is not defined),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57742724/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com