gpt4 book ai didi

python - 解析、查找章节并作为单独的文件写出

转载 作者:太空宇宙 更新时间:2023-11-03 20:11:17 25 4
gpt4 key购买 nike

我很难获得正确的代码来解析这本电子书中的章节,然后将 27 章打印到自己的文本文件中。我最远的是打印“CHAPTER-1.txt”。我不想硬编码任何东西,并且不确定我在哪里完全错过了目标。

infile = open('dracula.txt', 'r')

readlines = infile.readlines()

toc_list = readlines[74:185]

toc_text_lines = []
for line in toc_list:
if len(line) > 1:
stripped_line = line.strip()
toc_text_lines.append(stripped_line)

#print(len(toc_text_lines))

chaptitles = []
for text_lines in toc_text_lines:
split_text_line = text_lines.split()
if split_text_line[-1].isdigit():
chaptitles.append(text_lines)

#print(len(chaptitles))
print(chaptitles)

infile.close()

import re

with open('dracula.txt') as f:
book = f.readlines()



while book:
line = book.pop(0)
if "CHAPTER" in line and book.pop(0) == '\n':
for title in chapters_names_list: ['CHAPTER I.', 'CHAPTER II.',
'CHAPTER III.']
with open("{}.txt".format(chapters_names_list), 'w') :

最佳答案

我认为您可以从生成器中受益,假设其中一本电子书太大而无法放入内存,您将会遇到一些问题。

您可以做的是构建某种数据处理管道,首先在文件系统中查找文件(ebook.txt),但请记住,一旦我们有了文件名,我们打开它并一次生成一行,最后我们扫描每一行以查找“CHAPTER I.”、“CHAPTER II.”等

import os
import re
import fnmatch

def find_files(pattern, path):
"""
Here you can find all the filenames that match a specific pattern
using shell wildcard pattern that way you avoid hardcoding
the file pattern i.e 'dracula.txt'
"""
for root, dirs, files in os.walk(path):
for name in fnmatch.filter(files, pattern):
yield os.path.join(root, name)

def file_opener(filenames):
"""
Open a sequence of filenames one at a time
and make sure to close the file once we are done
scanning its content.
"""
for filename in filenames:
if filename.endswith('.txt'):
f = open(filename, 'rt')
yield f
f.close()

def chain_generators(iterators):
"""
Chain a sequence of iterators together
"""
for it in iterators:
# Look up yield from if you're unsure what it does
yield from it

def grep(pattern, lines):
"""
Look for a pattern in a line i.e 'CHAPTER I.'
"""
pat = re.compile(pattern)
for line in lines:
if pat.search(line):
yield line

# A simple way to use these functions together

logs = find_files('dracula*', 'Path/to/files')
files = file_opener(logs)
lines = chain_generators(files)
each_line = grep('CHAPTER I.', lines)
for match in each_line:
print(match)

您可以在这些实现的基础上进行构建来完成您想要做的事情。

请告诉我这是否有帮助。

关于python - 解析、查找章节并作为单独的文件写出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58703465/

25 4 0
文章推荐: html - 将一个
拆分为两个水平对齐的