gpt4 book ai didi

audio - FFMPEG 视频到音频的转换结果在不同的持续时间

转载 作者:行者123 更新时间:2023-12-02 23:01:44 27 4
gpt4 key购买 nike

我正在尝试将 MP4 文件转换为以 16,000 Hz 采样的单声道 WAV 文件。

当我运行以下代码时,持续时间从 开始00:09:59.99 (MP4) 至 00:09:57.64 (WAV)。它的原始较长版本从 00:48:37.46 (MP4) 到 00:48:23.38 (WAV)。

ffmpeg -i <FILE_NAME>.mp4 -ac 1 -ar 16000 <FILE_NAME>.wav

我也试过下面的代码。结果更糟,从 00:09:59.99 (MP4) 变为 00:12:56.29 (AAC)。
ffmpeg -I <FILE_NAME>.mp4 -vn -acodec copy <FILE_NAME>.aac

附上日志:
Report written to "ffmpeg-20200610-093115.log"
Command line:
ffmpeg -i short.mp4 -ac 1 -ar 16000 short.wav -report
ffmpeg version 4.1.1 Copyright (c) 2000-2019 the FFmpeg developers
built with Apple LLVM version 10.0.0 (clang-1000.11.45.5)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.1 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/openjdk-11.0.2.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/openjdk-11.0.2.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr
libavutil 56. 22.100 / 56. 22.100
libavcodec 58. 35.100 / 58. 35.100
libavformat 58. 20.100 / 58. 20.100
libavdevice 58. 5.100 / 58. 5.100
libavfilter 7. 40.101 / 7. 40.101
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 3.100 / 5. 3.100
libswresample 3. 3.100 / 3. 3.100
libpostproc 55. 3.100 / 55. 3.100
Splitting the commandline.
Reading option '-i' ... matched as input url with argument 'short.mp4'.
Reading option '-ac' ... matched as option 'ac' (set number of audio channels) with argument '1'.
Reading option '-ar' ... matched as option 'ar' (set audio sampling rate (in Hz)) with argument '16000'.
Reading option 'short.wav' ... matched as output url.
Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option report (generate a report) with argument 1.
Successfully parsed a group of options.
Parsing a group of options: input url short.mp4.
Successfully parsed a group of options.
Opening an input file: short.mp4.
[NULL @ 0x7f98a3008200] Opening 'short.mp4' for reading
[file @ 0x7f98a2904440] Setting default whitelist 'file,crypto'
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] ISO: File Type Major Brand: mp42
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] Unknown dref type 0x206c7275 size 12
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] Processing st: 0, edit list 0 - media time: 0, duration: 7679872
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] Unknown dref type 0x206c7275 size 12
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] Processing st: 1, edit list 0 - media time: 1024, duration: 26459559
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] drop a frame at curr_cts: 0 @ 0
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] Before avformat_find_stream_info() pos: 11213917 bytes read:318782 seeks:1 nb_streams:2
[h264 @ 0x7f98a3808800] nal_unit_type: 7(SPS), nal_ref_idc: 3
[h264 @ 0x7f98a3808800] nal_unit_type: 8(PPS), nal_ref_idc: 3
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] demuxer injecting skip 1024 / discard 0
[aac @ 0x7f98a1008c00] skip 1024 / discard 0 samples due to side data
[h264 @ 0x7f98a3808800] nal_unit_type: 6(SEI), nal_ref_idc: 0
[h264 @ 0x7f98a3808800] nal_unit_type: 5(IDR), nal_ref_idc: 3
[h264 @ 0x7f98a3808800] Format yuv420p chosen by get_format().
[h264 @ 0x7f98a3808800] Reinit context to 640x368, pix_fmt: yuv420p
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] All info found
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] After avformat_find_stream_info() pos: 21961 bytes read:351550 seeks:2 frames:46
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'short.mp4':
Metadata:
major_brand : mp42
minor_version : 1
compatible_brands: isommp41mp42
creation_time : 2020-06-10T16:12:17.000000Z
Duration: 00:09:59.99, start: 0.000000, bitrate: 149 kb/s
Stream #0:0(eng), 1, 1/12800: Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 640x360 [SAR 1:1 DAR 16:9], 47 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
Metadata:
creation_time : 2020-06-10T16:12:17.000000Z
handler_name : Core Media Video
Stream #0:1(eng), 45, 1/44100: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 98 kb/s (default)
Metadata:
creation_time : 2020-06-10T16:12:17.000000Z
handler_name : Core Media Audio
Successfully opened the file.
Parsing a group of options: output url short.wav.
Applying option ac (set number of audio channels) with argument 1.
Applying option ar (set audio sampling rate (in Hz)) with argument 16000.
Successfully parsed a group of options.
Opening an output file: short.wav.
[file @ 0x7f98a0c1db40] Setting default whitelist 'file,crypto'
Successfully opened the file.
Stream mapping:
Stream #0:1 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
cur_dts is invalid (this is harmless if it occurs once at the start per stream)
[aac @ 0x7f98a100de00] skip 1024 / discard 0 samples due to side data
cur_dts is invalid (this is harmless if it occurs once at the start per stream)
detected 12 logical cores
[graph_0_in_0_1 @ 0x7f98a0e2c4c0] Setting 'time_base' to value '1/44100'
[graph_0_in_0_1 @ 0x7f98a0e2c4c0] Setting 'sample_rate' to value '44100'
[graph_0_in_0_1 @ 0x7f98a0e2c4c0] Setting 'sample_fmt' to value 'fltp'
[graph_0_in_0_1 @ 0x7f98a0e2c4c0] Setting 'channel_layout' to value '0x4'
[graph_0_in_0_1 @ 0x7f98a0e2c4c0] tb:1/44100 samplefmt:fltp samplerate:44100 chlayout:0x4
[format_out_0_0 @ 0x7f98a0e2cb80] Setting 'sample_fmts' to value 's16'
[format_out_0_0 @ 0x7f98a0e2cb80] Setting 'sample_rates' to value '16000'
[format_out_0_0 @ 0x7f98a0e2cb80] Setting 'channel_layouts' to value '0x4'
[format_out_0_0 @ 0x7f98a0e2cb80] auto-inserting filter 'auto_resampler_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_0'
[AVFilterGraph @ 0x7f98a0c16ac0] query_formats: 4 queried, 6 merged, 3 already done, 0 delayed
[auto_resampler_0 @ 0x7f98a0e2d540] [SWR @ 0x7f98a28e1000] Using fltp internally between filters
[auto_resampler_0 @ 0x7f98a0e2d540] ch:1 chl:mono fmt:fltp r:44100Hz -> ch:1 chl:mono fmt:s16 r:16000Hz
Output #0, wav, to 'short.wav':
Metadata:
major_brand : mp42
minor_version : 1
compatible_brands: isommp41mp42
ISFT : Lavf58.20.100
Stream #0:0(eng), 0, 1/16000: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s (default)
Metadata:
creation_time : 2020-06-10T16:12:17.000000Z
handler_name : Core Media Audio
encoder : Lavc58.35.100 pcm_s16le
size= 17152kB time=00:09:16.63 bitrate= 252.4kbits/s speed=1.11e+03x
[out_0_0 @ 0x7f98a0e2c700] EOF on sink link out_0_0:default.
No more output streams to write to, finishing.
size= 18676kB time=00:09:59.99 bitrate= 255.0kbits/s speed=1.11e+03x
video:0kB audio:18676kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000408%
Input file #0 (short.mp4):
Input stream #0:0 (video): 1 packets read (3689 bytes);
Input stream #0:1 (audio): 25739 packets read (7375414 bytes); 25738 frames decoded (26355712 samples);
Total: 25740 packets (7379103 bytes) demuxed
Output file #0 (short.wav):
Output stream #0:0 (audio): 25739 frames encoded (9562163 samples); 25739 packets muxed (19124326 bytes);
Total: 25739 packets (19124326 bytes) muxed
25738 frames successfully decoded, 0 decoding errors
[AVIOContext @ 0x7f98a0c1dc40] Statistics: 4 seeks, 76 writeouts
[AVIOContext @ 0x7f98a29045c0] Statistics: 10902846 bytes read, 29 seeks

最佳答案

MP4、MKV 等容器存储带有时间戳的数据包。其副产品之一是它允许通过简单地调整数据包的时间戳来表示音轨中的静音,这些数据包旨在在它们之间保持静音。像 WAV 或原始 AAC 比特流这样的容器没有时间戳,因此以这种方式编码的任何“静默”都会丢失。

您的输入音频为 44100 Hz。在日志末尾附近的这一行中,

Input stream #0:1 (audio): 25739 packets read (7375414 bytes); 25738 frames decoded (26355712 samples); 

你看到输入流有 26355712 samples .在 44100 Hz 时,即 ~597.6351 seconds .这就是你在 WAV 输出中得到的。

要插入静音,为了保留源持续时间,请使用
ffmpeg -i <FILE_NAME>.mp4 -af aresample=async=1 -ac 1 -ar 16000 <FILE_NAME>.wav

关于audio - FFMPEG 视频到音频的转换结果在不同的持续时间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62308695/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com