gpt4 book ai didi

ios - 播放 mp3 音频时显示字幕

转载 作者:塔克拉玛干 更新时间:2023-11-02 08:39:43 24 4
gpt4 key购买 nike

我需要构建一个应用程序,它会在播放音频时突出显示字符串中的单词。我需要这与声音同步完成。音频不连续。例如“你好,你好吗?” “你好”和“如何”之间可能会有延迟。

我检查了 MIDI 文件格式,但它只能保存音符,而我的 mp3 有语音。

我看到的最后一个选项是维护一个文件,其中包含音频中单词的开始时间,并在显示时突出显示它。

有人可以建议更好的选择吗?

最佳答案

ID3 标签支持SYLT 或同步歌词框架。来自 http://id3.org/id3v2.4.0-frames :

4.9. Synchronised lyrics/text

This is another way of incorporating the words, said or sung lyrics, in the audio file as text, this time, however, in sync with the audio. It might also be used to describing events e.g. occurring on a stage or on the screen in sync with the audio. The header includes a content descriptor, represented with as terminated text string. If no descriptor is entered, 'Content descriptor' is $00 (00) only.

 <Header for 'Synchronised lyrics/text', ID: "SYLT">
Text encoding $xx
Language $xx xx xx
Time stamp format $xx
Content type $xx
Content descriptor <text string according to encoding> $00 (00)

Content type: $00 is other $01 is lyrics $02 is text transcription $03 is movement/part name (e.g. "Adagio") $04 is events (e.g. "Don Quijote enters the stage") $05 is chord (e.g. "Bb F Fsus") $06 is trivia/'pop up' information $07 is URLs to webpages $08 is URLs to images

Time stamp format:

 $01  Absolute time, 32 bit sized, using MPEG [MPEG] frames as unit
$02 Absolute time, 32 bit sized, using milliseconds as unit

Absolute time means that every stamp contains the time from the
beginning of the file.

The text that follows the frame header differs from that of the
unsynchronised lyrics/text transcription in one major way. Each
syllable (or whatever size of text is considered to be convenient by
the encoder) is a null terminated string followed by a time stamp
denoting where in the sound file it belongs. Each sync thus has the
following structure:

 Terminated text to be synced (typically a syllable)
Sync identifier (terminator to above string) $00 (00)
Time stamp $xx (xx ...)

The 'time stamp' is set to zero or the whole sync is omitted if
located directly at the beginning of the sound. All time stamps
should be sorted in chronological order. The sync can be considered
as a validator of the subsequent string.

Newline characters are allowed in all "SYLT" frames and MUST be used after every entry (name, event etc.) in a frame with the content type $03 - $04.

A few considerations regarding whitespace characters: Whitespace
separating words should mark the beginning of a new word, thus
occurring in front of the first syllable of a new word. This is also
valid for new line characters. A syllable followed by a comma should
not be broken apart with a sync (both the syllable and the comma
should be before the sync).

An example: The "USLT" passage

 "Strangers in the night" $0A "Exchanging glances"

would be "SYLT" encoded as:

 "Strang" $00 xx xx "ers" $00 xx xx " in" $00 xx xx " the" $00 xx xx
" night" $00 xx xx 0A "Ex" $00 xx xx "chang" $00 xx xx "ing" $00 xx
xx "glan" $00 xx xx "ces" $00 xx xx

There may be more than one "SYLT" frame in each tag, but only one
with the same language and content descriptor.

从几天前开始,taglib 的主分支 (https://github.com/taglib/taglib) 支持 SYLT 框架。您可以使用 taglib 为您的显示器提取同步歌词。

关于ios - 播放 mp3 音频时显示字幕,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22934529/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com