gpt4 book ai didi

java - YouTube 自动生成的字幕文件具有非顺序时序

转载 作者:行者123 更新时间:2023-11-30 07:03:50 25 4
gpt4 key购买 nike

我使用 YouTube API 3 上传视频,然后根据自动字幕请求其字幕文件,我得到了以下非连续计时文件

<小时/>

1

00:00:00,000 --> 00:00:06,629

周末好,我的周末过得怎么样

2

00:00:05,549 --> 00:00:14,960

我们不这样做

3

00:00:06,629 --> 00:00:14,960

是的,这很好,罗马,是的,我必须

<小时/>

示例视频:https://youtu.be/F2TVsMD_bDQ

那么为什么每个字幕槽的结尾不是下一个字幕槽的第一个呢?

最佳答案

经过几天的搜索和挖掘 YouTube 文档后,我发现没有任何东西可以解决这个问题,所以我自己解决了这个情况我使用正则表达式创建了代码来修复字幕时间顺序我已经针对 5 个视频进行了测试,它有效完美:

/**
*
* @author youans
*/
public class SubtitleCorrector {

/**
* @param args the command line arguments
*/
public static void main(String[] args) {
try {
String fileContent = null;
File inFile = new File("/IN_DIRECTORY/Test Video Bad Format.srt");
BufferedReader br = new BufferedReader(new FileReader(inFile));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();

while (line != null) {
sb.append(line);
sb.append("\n");
line = br.readLine();
}
fileContent = sb.toString();
} finally {
br.close();
}
String ragex = "\\d{2}:\\d{2}:\\d{2},\\d{3}";
List<String> slotsTiming = new ArrayList(new TreeSet(getAllMatches(fileContent, ragex)));

System.out.println(slotsTiming.size());

String timingRagex = "(((^1\n)|(\\n\\d+\n))(\\d{2}:\\d{2}:\\d{2},\\d{3}.*\\d{2}:\\d{2}:\\d{2},\\d{3}))";
ragex = timingRagex + "[A-Za-z-,;'\"\\s]+";

List<String> subtitleSlots = getAllMatches(fileContent, ragex);
List<String> textOnlySlots = new ArrayList();

for (String subtitleSlot : subtitleSlots) {
textOnlySlots.add(subtitleSlot.replaceAll(timingRagex + "|\n", ""));
}
StringBuilder sb = new StringBuilder("");

for (int i = 0; i < textOnlySlots.size(); i++) {
sb.append((i + 1)).append("\n").append(slotsTiming.get(i)).append(" --> ").append(slotsTiming.get(i + 1)).append("\n").append(textOnlySlots.get(i)).append("\n\n");
}

File outFile = new File("/OUT_DIRECTOR/" + inFile.getName().replaceFirst("[.][^.]+$|bad format", "") + "_edited.SRT");
PrintWriter pw = new PrintWriter(outFile);

pw.write(sb.toString());
pw.flush();
pw.close();

} catch (Exception ex) {
ex.printStackTrace();
}

}

public static List<String> getAllMatches(String text, String regex) {
List matches = new ArrayList<>();
Matcher m = Pattern.compile("(?=(" + regex + "))").matcher(text);
while (m.find()) {
matches.add(m.group(1));
}
return matches;
}

}

关于java - YouTube 自动生成的字幕文件具有非顺序时序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40455661/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com