gpt4 book ai didi

python - 使用Python xlsxwriter模块将srt数据写入excel

转载 作者:行者123 更新时间:2023-11-30 23:07:17 24 4
gpt4 key购买 nike

这次我尝试使用Python的xlsxwriter模块将.srt中的数据写入excel。

字幕文件在 sublime text 中看起来像这样:

但我想将数据写入excel,所以它看起来像这样:

这是我第一次为此编写Python代码,所以我仍然处于尝试和错误的阶段......我尝试编写一些如下代码

但我认为这没有意义......

我会继续尝试,但如果您知道该怎么做,请告诉我。我将阅读您的代码并尝试理解它们!谢谢你! :)

最佳答案

以下将问题分解为几个部分:

  • 解析输入文件。 parse_subtitlesgenerator获取行源并生成 {'index':'N', 'timestamp':'NN:NN:NN,NNN -> NN:NN:NN,NNN' 形式的记录序列, '副标题':'文本'}'。我采取的方法是跟踪我们处于三种不同状态中的哪一种:
    1. 寻求下一个条目,当我们寻找下一个索引号时,它应该与正则表达式 ^\d*$ 匹配(只不过是一堆数字)
    2. 查找时间戳,当找到索引时,我们期望时间戳出现在下一行,该时间戳应与正则表达式 ^\d{2}:\d{2 匹配}:\d{2},\d{3} -->\d{2}:\d{2}:\d{2},\d{3}$ (HH:MM:SS ,mmm -> HH:MM:SS,mmm) 和
    3. 阅读字幕,同时使用实际的字幕文本,将空行和 EOF 解释为字幕终止点。
  • 将上述记录写入工作表中的一行。 write_dict_to_worksheet 接受一行和工作表,以及一条记录和一个字典,为每个记录的键定义 Excel 0 索引的列号,然后适本地写入数据。
  • 组织整个转换 convert 接受输入文件名(例如 'Wildlife.srt'),该文件名将被打开并传递给 parse_subtitles函数和输出文件名(例如将使用 xlsxwriter 创建的 'Subtitle.xlsx')。然后,它写入一个 header ,并且对于从输入文件解析的每条记录, writes that record to the XLSX file .

Logging statements出于 self 注释的目的而留下,并且因为在复制输入文件时,我在时间戳中将 : 插入到 ; 中,使其无法识别,并出现错误弹出窗口对于调试很方便!

我已将源文件的文本版本以及以下代码放在 this Gist

import xlsxwriter
import re
import logging

def parse_subtitles(lines):
line_index = re.compile('^\d*$')
line_timestamp = re.compile('^\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}$')
line_seperator = re.compile('^\s*$')

current_record = {'index':None, 'timestamp':None, 'subtitles':[]}
state = 'seeking to next entry'

for line in lines:
line = line.strip('\n')
if state == 'seeking to next entry':
if line_index.match(line):
logging.debug('Found index: {i}'.format(i=line))
current_record['index'] = line
state = 'looking for timestamp'
else:
logging.error('HUH: Expected to find an index, but instead found: [{d}]'.format(d=line))

elif state == 'looking for timestamp':
if line_timestamp.match(line):
logging.debug('Found timestamp: {t}'.format(t=line))
current_record['timestamp'] = line
state = 'reading subtitles'
else:
logging.error('HUH: Expected to find a timestamp, but instead found: [{d}]'.format(d=line))

elif state == 'reading subtitles':
if line_seperator.match(line):
logging.info('Blank line reached, yielding record: {r}'.format(r=current_record))
yield current_record
state = 'seeking to next entry'
current_record = {'index':None, 'timestamp':None, 'subtitles':[]}
else:
logging.debug('Appending to subtitle: {s}'.format(s=line))
current_record['subtitles'].append(line)

else:
logging.error('HUH: Fell into an unknown state: `{s}`'.format(s=state))
if state == 'reading subtitles':
# We must have finished the file without encountering a blank line. Dump the last record
yield current_record

def write_dict_to_worksheet(columns_for_keys, keyed_data, worksheet, row):
"""
Write a subtitle-record to a worksheet.
Return the row number after those that were written (since this may write multiple rows)
"""
current_row = row
#First, horizontally write the entry and timecode
for (colname, colindex) in columns_for_keys.items():
if colname != 'subtitles':
worksheet.write(current_row, colindex, keyed_data[colname])

#Next, vertically write the subtitle data
subtitle_column = columns_for_keys['subtitles']
for morelines in keyed_data['subtitles']:
worksheet.write(current_row, subtitle_column, morelines)
current_row+=1

return current_row

def convert(input_filename, output_filename):
workbook = xlsxwriter.Workbook(output_filename)
worksheet = workbook.add_worksheet('subtitles')
columns = {'index':0, 'timestamp':1, 'subtitles':2}

next_available_row = 0
records_processed = 0
headings = {'index':"Entries", 'timestamp':"Timecodes", 'subtitles':["Subtitles"]}
next_available_row=write_dict_to_worksheet(columns, headings, worksheet, next_available_row)

with open(input_filename) as textfile:
for record in parse_subtitles(textfile):
next_available_row = write_dict_to_worksheet(columns, record, worksheet, next_available_row)
records_processed += 1

print('Done converting {inp} to {outp}. {n} subtitle entries found. {m} rows written'.format(inp=input_filename, outp=output_filename, n=records_processed, m=next_available_row))
workbook.close()

convert(input_filename='Wildlife.srt', output_filename='Subtitle.xlsx')

编辑:更新为在输出中将多行字幕拆分为多行

关于python - 使用Python xlsxwriter模块将srt数据写入excel,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32293013/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com