python - 如何从古腾堡自动下载书籍-6ren

python - 如何从古腾堡自动下载书籍

转载作者：行者123 更新时间：2023-11-28 22:18:46

我正在尝试从“http://www.gutenberg.org/”下载书籍。我想知道为什么我的代码什么也得不到。

import requests
import re
import os
import urllib

def get_response(url):
    response = requests.get(url).text 
    return response

def get_content(html):
    reg = re.compile(r'(<span class="mw-headline".*?</span></h2><ul><li>.*</a></li></ul>)',re.S) 
    return re.findall(reg,html)


def get_book_url(response):
    reg = r'a href="(.*?)"'
    return re.findall(reg,response)

def get_book_name(response):
    reg = re.compile('>.*</a>')
    return re.findall(reg,response)


def download_book(book_url,path):
    path = ''.join(path.split())
    path = 'F:\\books\\{}.html'.format(path) #my local file path

    if not os.path.exists(path):
        urllib.request.urlretrieve(book_url,path)
        print('ok!!!')
    else:
        print('no!!!')

def get_url_name(start_url):
    content = get_content(get_response(start_url))
    for i in content:
        book_url = get_book_url(i)
        if book_url:
            book_name = get_book_name(i)
            try:
                download_book(book_url[0],book_name[0])
            except:
                continue

def main():
    get_url_name(start_url)

if __name__ == '__main__':
    start_url = 'http://www.gutenberg.org/wiki/Category:Classics_Bookshelf'
    main()

我已经运行了代码但什么也没得到，没有回溯。如何从网站自动下载书籍？

最佳答案

I have run the code and get nothing,no tracebacks.

好吧，在 download_book() 中出现异常的情况下，您不可能获得回溯，因为您明确地让它们保持沉默:

        try:
            download_book(book_url[0],book_name[0])
        except:
            continue

所以你要做的第一件事是至少打印出错误:

        try:
            download_book(book_url[0],book_name[0])
        except exception as e:
            print("while downloading book {} : got error {}".format(book_url[0], e)
            continue

或者根本不捕获异常(至少在您知道会发生什么以及如何处理它之前)。