audio - 如何使用 ffmpeg 缩短音频文件中的静音？-6ren

audio - 如何使用 ffmpeg 缩短音频文件中的静音？

转载作者：行者123 更新时间：2023-12-04 22:53:18

我正在尝试使用 ffmpeg 缩短录音中的多余静音(缩短它们，而不是完全消除静音)。我使用的当前代码:

ffmpeg -hide_banner -i file_name.m4a  -af silenceremove=0:0:0:-1:0.7:-30dB file_name_short.m4a

不管用。它检测到超过 0.7 秒的静音并完全删除它们，这不是我想要的。任何人都知道如何截断沉默，例如，将超过 1 秒的沉默缩短到 0.5 秒？

最佳答案

ffmpeg的silenceremove命令的参数似乎只允许你删除所有超过一定长度的静音。这意味着如果您传入 stop_duration=0.5，并且有一个 2.2 秒长的静音 block ，您最终将剩下 0.2 秒的静音 (2.2 - 0.5 - 0.5 - 0.5 - 0.5 = 0.2)。
如果您不介意在 .wav 格式之间来回转换，可以使用我编写的这个 Python 脚本。它有很多选项，即使它是在 Python 中，它也使用 NumPy，因此它可以在不到一秒的时间内处理短文件，并且可以在大约 5.7 秒内处理 2 小时长的 .wav，这是不错的。为了提高速度，这可以用 C++ 重写。对于视频，可以使用 OpenCV 扩展解决方案。
优点:

可自动判断阈值，进取性可调

可以指定最大静音持续时间

可以指定最小的非静音持续时间以避免静音之间的瞬间

可以用它来检测静默期(快得多；在 1.7 秒内处理 2 小时)

避免覆盖文件，除非被告知这样做

它受到它使用的模块的限制。渔获物是:

它将整个文件读入内存

它仅适用于波形文件，不保留元数据。 (解决方法见下文)

它可以处理常见的 WAVE 标准，除非您没有安装 SciPy，在这种情况下，它使用 Python 的 wave 模块，该模块仅适用于 16 位 PCM

在您的情况下使用:

将 m4a 转换为 wav:ffmpeg -i myfile.m4a myfile.wav

运行静音卸妆:python3 trim_silence.py --input=myfile.wav

使用元数据转换回来:ffmpeg -i result.wav -i myfile.m4a -map_metadata 1 myfile_trimmed.m4a

完整的使用说明:

usage: trim_silence.py [-h] --input INPUT [--output OUTPUT] [--threshold THRESHOLD] [--silence-dur SILENCE_DUR] [--non-silence-dur NON_SILENCE_DUR]
                             [--mode MODE] [--auto-threshold] [--auto-aggressiveness AUTO_AGGRESSIVENESS] [--detect-only] [--verbose] [--show-silence] [--time-it]
                             [--overwrite]

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT         (REQUIRED) name of input wav file (default: None)
  --output OUTPUT       name of output wave file (default: result.wav)
  --threshold THRESHOLD
                        silence threshold - can be expressed in dB, e.g. --threshold=-25.5dB (default: -25dB)
  --silence-dur SILENCE_DUR
                        maximum silence duration desired in output (default: 0.5)
  --non-silence-dur NON_SILENCE_DUR
                        minimum non-silence duration between periods of silence of at least --silence-dur length (default: 0.1)
  --mode MODE           silence detection mode - can be 'any' or 'all' (default: all)
  --auto-threshold      automatically determine silence threshold (default: False)
  --auto-aggressiveness AUTO_AGGRESSIVENESS
                        aggressiveness of the auto-threshold algorithm. Integer between [-20,20] (default: 3)
  --detect-only         don't trim, just detect periods of silence (default: False)
  --verbose             print general information to the screen (default: False)
  --show-silence        print locations of silence (always true if --detect-only is used) (default: False)
  --time-it             show steps and time to complete them (default: False)
  --overwrite           overwrite existing output file, if applicable (default: False)

的内容| trim_silence.py :

import numpy as np
import argparse
import time
import sys
import os

def testmode(mode):
    mode = mode.lower()
    valid_modes = ["all","any"]
    if mode not in valid_modes:
        raise Exception("mode '{mode}' is not valid - must be one of {valid_modes}")
    return mode
def testaggr(aggr):
    try:
        aggr = min(20,max(-20,int(aggr)))
        return aggr
    except:
        raise Exception("auto-aggressiveness '{aggr}' is not valid - see usage")
    

parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument("--input",                              type=str,   help="(REQUIRED) name of input wav file", required=True)
parser.add_argument("--output",       default="result.wav", type=str,   help="name of output wave file")
parser.add_argument("--threshold",    default="-25dB",      type=str,   help="silence threshold - can be expressed in dB, e.g. --threshold=-25.5dB")
parser.add_argument("--silence-dur",  default=0.5,          type=float, help="maximum silence duration desired in output")
parser.add_argument("--non-silence-dur", default=0.1,      type=float, help="minimum non-silence duration between periods of silence of at least --silence-dur length")
parser.add_argument("--mode",         default="all",        type=testmode,  help="silence detection mode - can be 'any' or 'all'")
parser.add_argument("--auto-threshold",action="store_true",             help="automatically determine silence threshold")
parser.add_argument("--auto-aggressiveness",default=3,type=testaggr, help="aggressiveness of the auto-threshold algorithm.  Integer between [-20,20]")
parser.add_argument("--detect-only",  action="store_true",              help="don't trim, just detect periods of silence")
parser.add_argument("--verbose",      action="store_true",              help="print general information to the screen")
parser.add_argument("--show-silence", action="store_true",              help="print locations of silence (always true if --detect-only is used)")
parser.add_argument("--time-it",      action="store_true",              help="show steps and time to complete them")
parser.add_argument("--overwrite",    action="store_true",              help="overwrite existing output file, if applicable")

args = parser.parse_args()
args.show_silence = args.show_silence or args.detect_only
if not args.detect_only and not args.overwrite:
    if os.path.isfile(args.output):
        print(f"Output file ({args.output}) already exists.  Use --overwrite to overwrite the existing file.")
        sys.exit(1)

if (args.silence_dur < 0):  raise Exception("Maximum silence duration must be >= 0.0")
if (args.non_silence_dur < 0):  raise Exception("Minimum non-silence duration must be >= 0.0")

try:
    from scipy.io import wavfile
    using_scipy = True
except:
    if args.verbose:  print("Failure using 'import scipy.io.wavfile'.  Using 'import wave' instead.")
    import wave
    using_scipy = False

if args.verbose: print(f"Inputs:\n  Input File: {args.input}\n  Output File: {args.output}\n  Max. Silence Duration: {args.silence_dur}\n  Min. Non-silence Duration: {args.non_silence_dur}")

from matplotlib import pyplot as plt
def plot(x):
    plt.figure()
    plt.plot(x,'o')
    plt.show()

def threshold_for_channel(ch):
    global data
    nbins = 100
    max_len = min(1024*1024*100,data.shape[0]) # limit to first 100 MiB
    if len(data.shape) > 1:
        x = np.abs(data[:max_len,ch]*1.0)
    else:
        x = np.abs(data[:max_len]*1.0)
    if data.dtype==np.uint8: x -= 127
    hist,edges = np.histogram(x,bins=nbins,density=True)
    slope = np.abs(hist[1:] - hist[:-1])
    argmax = np.argmax(slope < 0.00002)
    argmax = max(0,min(argmax + args.auto_aggressiveness, len(edges)-1))
    thresh = edges[argmax] + (127 if data.dtype==np.uint8 else 0)
    return thresh

def auto_threshold():
    global data
    max_thresh = 0
    channel_count = 1 if len(data.shape)==1 else data.shape[1]
    for ch in range(channel_count):
        max_thresh = max(max_thresh,threshold_for_channel(ch))
    return max_thresh
        

silence_threshold = str(args.threshold).lower().strip()
if args.auto_threshold:
    if args.verbose: print (f"  Silence Threshold: AUTO (aggressiveness={args.auto_aggressiveness})")
else:
    if "db" in silence_threshold:
        silence_threshold_db = float(silence_threshold.replace("db",""))
        silence_threshold = np.round(10**(silence_threshold_db/20.),6)
    else:
        silence_threshold = float(silence_threshold)
        silence_threshold_db = 20*np.log10(silence_threshold)

    if args.verbose: print (f"  Silence Threshold: {silence_threshold} ({np.round(silence_threshold_db,2)} dB)")
if args.verbose: print (f"  Silence Mode: {args.mode.upper()}")
if args.verbose: print("")
if args.time_it: print(f"Reading in data from {args.input}... ",end="",flush=True)
start = time.time()
if using_scipy:
    sample_rate, data = wavfile.read(args.input)
    input_dtype = data.dtype
    Ts = 1./sample_rate
    
    if args.auto_threshold:
        silence_threshold = auto_threshold()
    else:
        if data.dtype != np.float32:
            sampwidth = data.dtype.itemsize
            if (data.dtype==np.uint8):  silence_threshold += 0.5 # 8-bit unsigned PCM
            scale_factor = (256**sampwidth)/2.
            silence_threshold *= scale_factor
else:
    handled_sampwidths = [2]
    with wave.open(args.input,"rb") as wavin:
        params = wavin.getparams()
        if params.sampwidth in handled_sampwidths:
            raw_data = wavin.readframes(params.nframes)
    if params.sampwidth not in handled_sampwidths:
        print(f"Unable to handle a sample width of {params.sampwidth}")
        sys.exit(1)
end = time.time()
if args.time_it: print(f"complete (took {np.round(end-start,6)} seconds)")

if not using_scipy:
    if args.time_it: print(f"Unpacking data... ",end="",flush=True)
    start = time.time()
    Ts = 1.0/params.framerate
    if params.sampwidth==2: # 16-bit PCM
        format_ = 'h'
        data = np.frombuffer(raw_data,dtype=np.int16)
    elif params.sampwidth==3: # 24-bit PCM
        format_ = 'i'
        print(len(raw_data))
        data = np.frombuffer(raw_data,dtype=np.int32)
    
    data = data.reshape(-1,params.nchannels) # reshape into channels
    if args.auto_threshold:
        silence_threshold = auto_threshold()
    else:
        scale_factor = (256**params.sampwidth)/2. # scale to [-1:1)
        silence_threshold *= scale_factor
    data = 1.0*data # convert to np.float64
    end = time.time()
    if args.time_it: print(f"complete (took {np.round(end-start,6)} seconds)")

silence_duration_samples = args.silence_dur / Ts

if args.verbose: print(f"Input File Duration = {np.round(data.shape[0]*Ts,6)}\n")

combined_channel_silences = None
def detect_silence_in_channels():
    global combined_channel_silences
    if len(data.shape) > 1:
        if args.mode=="any":
            combined_channel_silences = np.min(np.abs(data),axis=1) <= silence_threshold
        else:
            combined_channel_silences = np.max(np.abs(data),axis=1) <= silence_threshold
    else:
        combined_channel_silences = np.abs(data) <= silence_threshold

    combined_channel_silences = np.pad(combined_channel_silences, pad_width=1,mode='constant',constant_values=0)
    
    
def get_silence_locations():
    global combined_channel_silences

    starts =  combined_channel_silences[1:] & ~combined_channel_silences[0:-1]
    ends   = ~combined_channel_silences[1:] &  combined_channel_silences[0:-1]
    start_locs = np.nonzero(starts)[0]
    end_locs   = np.nonzero(ends)[0]
    durations  = end_locs - start_locs
    long_durations = (durations > silence_duration_samples)
    long_duration_indexes = np.nonzero(long_durations)[0]
    
    if len(long_duration_indexes) > 1:
        non_silence_gaps = start_locs[long_duration_indexes[1:]] - end_locs[long_duration_indexes[:-1]]
        short_non_silence_gap_locs = np.nonzero(non_silence_gaps <= (args.non_silence_dur/Ts))[0]
        for loc in short_non_silence_gap_locs:
            if args.verbose and args.show_silence:
                ns_gap_start = end_locs[long_duration_indexes[loc]] * Ts
                ns_gap_end   = start_locs[long_duration_indexes[loc+1]] * Ts
                ns_gap_dur   = ns_gap_end - ns_gap_start
                print(f"Removing non-silence gap at {np.round(ns_gap_start,6)} seconds with duration {np.round(ns_gap_dur,6)} seconds")
            end_locs[long_duration_indexes[loc]] = end_locs[long_duration_indexes[loc+1]]
                
        long_duration_indexes = np.delete(long_duration_indexes, short_non_silence_gap_locs + 1)

    if args.show_silence:
        if len(long_duration_indexes)==0:
            if args.verbose: print("No periods of silence found")
        else:
            if args.verbose: print("Periods of silence shown below")
            fmt_str = "%-12s  %-12s  %-12s"
            print(fmt_str % ("start","end","duration"))
            for idx in long_duration_indexes:
                start = start_locs[idx]
                end = end_locs[idx]
                duration = end - start
                print(fmt_str % (np.round(start*Ts,6),np.round(end*Ts,6),np.round(duration*Ts,6)))
        if args.verbose: print("")

    return start_locs[long_duration_indexes], end_locs[long_duration_indexes]

def trim_data(start_locs,end_locs):
    global data
    if len(start_locs)==0: return
    keep_at_start = int(silence_duration_samples / 2)
    keep_at_end = int(silence_duration_samples - keep_at_start)
    start_locs = start_locs + keep_at_start
    end_locs = end_locs - keep_at_end
    delete_locs = np.concatenate([np.arange(start_locs[idx],end_locs[idx]) for idx in range(len(start_locs))])
    data = np.delete(data, delete_locs, axis=0)

def output_data(start_locs,end_locs):
    global data
    if args.verbose: print(f"Output File Duration = {np.round(data.shape[0]*Ts,6)}\n")
    if args.time_it: print(f"Writing out data to {args.output}... ",end="",flush=True)
    if using_scipy:
        wavfile.write(args.output, sample_rate, data)
    else:
        packed_buf = data.astype(format_).tobytes()
        with wave.open(args.output,"wb") as wavout:
            wavout.setparams(params) # same params as input
            wavout.writeframes(packed_buf)

start = time.time()
if not args.verbose and args.time_it: print("Detecting silence... ",end="",flush=True)
detect_silence_in_channels()
(start_locs,end_locs) = get_silence_locations()
end = time.time()
if not args.verbose and args.time_it: print(f"complete (took {np.round(end-start,6)} seconds)")

if args.detect_only:
    if args.verbose: print("Not trimming, because 'detect only' flag was set")
else:
    if args.time_it: print("Trimming data... ",end="",flush=True)
    start = time.time()
    trim_data(start_locs,end_locs)
    end = time.time()
    if args.time_it: print(f"complete (took {np.round(end-start,6)} seconds)")
    start = time.time()
    output_data(start_locs, end_locs)
    end = time.time()
    if args.time_it: print(f"complete (took {np.round(end-start,6)} seconds)")

如果您想要一个假定 16 位 PCM 且没有所有额外打印语句的脚本，那么:

import numpy as np
from scipy.io import wavfile

# Params
(infile,outfile,threshold_db,silence_dur,non_silence_dur,mode) = ("test_stereo.wav","result.wav",-25,0.5,0.1,"all")
silence_threshold = np.round(10**(threshold_db/20.),6) * 32768 # Convert from dB to linear units and scale, assuming 16-bit PCM input

# Read data
Fs, data = wavfile.read(infile)
silence_duration_samples = silence_dur * Fs
if len(data.shape)==1: data = np.expand_dims(data,axis=1)

# Find silence
find_func = np.min if mode=="any" else np.max
combined_channel_silences = find_func(np.abs(data),axis=1) <= silence_threshold
combined_channel_silences = np.pad(combined_channel_silences, pad_width=1,mode='constant',constant_values=0)

# Get start and stop locations
starts =  combined_channel_silences[1:] & ~combined_channel_silences[0:-1]
ends   = ~combined_channel_silences[1:] &  combined_channel_silences[0:-1]
start_locs = np.nonzero(starts)[0]
end_locs   = np.nonzero(ends)[0]
durations  = end_locs - start_locs
long_durations = (durations > silence_duration_samples)
long_duration_indexes = np.nonzero(long_durations)[0]
    
# Cut out short non-silence between silence
if len(long_duration_indexes) > 1:
    non_silence_gaps = start_locs[long_duration_indexes[1:]] - end_locs[long_duration_indexes[:-1]]
    short_non_silence_gap_locs = np.nonzero(non_silence_gaps <= (non_silence_dur * Fs))[0]
    for loc in short_non_silence_gap_locs:
        end_locs[long_duration_indexes[loc]] = end_locs[long_duration_indexes[loc+1]]
    long_duration_indexes = np.delete(long_duration_indexes, short_non_silence_gap_locs + 1)

    (start_locs,end_locs) = (start_locs[long_duration_indexes], end_locs[long_duration_indexes])

# Trim data
if len(long_duration_indexes) > 1:
    if len(start_locs) > 0:
        keep_at_start = int(silence_duration_samples / 2)
        keep_at_end = int(silence_duration_samples - keep_at_start)
        start_locs = start_locs + keep_at_start
        end_locs = end_locs - keep_at_end
        delete_locs = np.concatenate([np.arange(start_locs[idx],end_locs[idx]) for idx in range(len(start_locs))])
        data = np.delete(data, delete_locs, axis=0)

# Output data
wavfile.write(outfile, Fs, data)

关于audio - 如何使用 ffmpeg 缩短音频文件中的静音？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47910301/

文章推荐： node.js - 在nodejs的videoshow中找不到ffmpeg

文章推荐： php - Imagemagick convert -fill 不会改变特定调色板区域的边缘

文章推荐： ffmpeg - 树莓派 ffmpeg video4linux2, v4l2 mmap 没有这样的设备

url - 文件 :/, 文件 ://, 文件:///和有什么区别
今天我在一个 Java 应用程序中看到了几种不同的加载文件的方法。文件:/ 文件:// 文件:/// 这三个 URL 开头有什么区别？使用它们的首选方式是什么？非常感谢斯特凡最佳答案 file
EDI X12 文件 - 文件
就目前而言，这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持，但这个问题可能会引起辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开，visit the he
c# - 从 javascript 文件(.js 文件)调用 C# 函数(.cs 文件)
我有一个 javascript 文件，并且在该方法中有一个“测试”方法，我喜欢调用 C# 函数。 c# 函数与 javascript 文件不在同一文件中。它位于 .cs 文件中。那么我该如何管理 j
java - 文件.canWrite();文件.canRead();文件.canExceute();尽管我的文件/目录没有访问权限，但始终返回 true
需要检查我使用的文件/目录的权限 //filePath = path of file/directory access denied by user ( in windows ) File fil
intellij-idea - 将外部 java 文件(*.java 文件，而不是 jar 文件)添加到 Intellij 中的项目
我在一个目录中有很多 java 文件，我想在我的 Intellij 项目中使用它。但是我不想每次开始一个新项目时都将 java 文件复制到我的项目中。我知道我可以在 Visual Studio 和
linux - 我不小心复制了一个 bash 文件，并将其替换为 Linux Mint 中的 .bashrc 文件。我该如何检索 .bashrc 文件？
已关闭。此问题不符合Stack Overflow guidelines 。目前不接受答案。这个问题似乎不是关于 a specific programming problem, a software
php - 在 Twig 文件 B 中渲染 Twig 文件 A 但仅在查看 Twig 文件 C 时？
我有 3 个组件的 Twig 文件: 文件 1: {# content-here #} 文件 2: {{ title-here }} {# content-here #}
linux - 如何在 Linux 中将 .c 文件转换为 .so 文件？我有 .c 文件。并需要 Linux 命令将其转换为 .so 文件
我得到了 mod_ldap.c 和 mod_authnz_ldap.c 文件。我需要使用 Linux 命令的 mod_ldap.so 和 mod_authnz_ldap.so 文件。最佳答案从 c
javascript - 文件 .htc 文件 - 我需要网络服务器吗？
我想使用PIE在我的项目中使用 IE7。但是我不明白的是，我只能在网络服务器上使用 .htc 文件吗？我可以在没有网络服务器的情况下通过浏览器加载的本地页面中使用它吗？我在 PIE 的文档中看到
Java docker 文件。构建并测试还是只运行 jar 文件？
我在 CI 管道中考虑这一点，我应该首先构建和测试我的应用程序，结果应该是一个 docker 镜像。我想知道使用构建环境在构建服务器上构建然后运行测试是否更常见。也许为此使用构建脚本。最后只需将 j
C++ 文件 I/O 我无法读取 .dat 文件
using namespace std; struct WebSites { string siteName; int rank; string getSiteName() {
c++ - 尽管链接了库文件夹中的 .so 文件，但缺少 .h 文件
我是 Linux 新手，目前正在尝试使用 ginkgo USB-CAN 接口(interface) 的 API 编程功能。为了使用 C++ 对 API 进行编程，他们提供了库文件，其中包含三个带有 .
C 文件 I/O，使用 TXT 文件
我刚学C语言，在实现一个程序时遇到了问题将 test.txt 文件作为程序的输入。 test.txt 文件的内容是: 1 30 30 40 50 60 2 40 30 50 60 60 3 30 20
networking - 如何连接两个 tcpdump 文件(pcap 文件)
如何连接两个tcpdump文件，使一个流量在文件中出现一个接一个？具体来说，我想“乘以”一个 tcpdump 文件，这样所有的 session 将一个接一个地按顺序重复几次。最佳答案 mergeca
video - 修复损坏的 .MP4 文件。这可能是一个 3gp 文件
我有一个名为 input.MP4 的文件，它已损坏。它来自闭路电视摄像机。我什么都试过了，ffmpeg , VLC 转换，没有运气。但是，我使用了 mediainfo和 exiftool并提取以下信息
android - 如何提取 ISO 文件，然后将其重新打包成可用的 ISO 文件？
我想做什么？我想提取 ISO 文件并编辑其中的文件，然后将其重新打包回 ISO 文件。 (正如你已经读过的) 我为什么要这样做？我想开始修改 PSP ISO，为此我必须使用游戏资源、 Assets
compression - 给定一个解压缩的 gzip 文件，有没有办法重新创建准确的原始 gzip 文件？
给定一个 gzip 文件 Z，如果我将其解压缩为 Z'，有什么办法可以重新压缩它以恢复完全相同的 gzip 文件 Z？在粗略阅读了 DEFLATE 格式后，我猜不会，因为任何给定的文件都可能在 DEF
sql-server - 带附件的数据库电子邮件(excel 文件/pdf 文件)？
我必须从数据库向我的邮件 ID 发送一封带有附件的邮件。 EXEC msdb.dbo.sp_send_dbmail @profile_name = 'Adventure Works Admin
audio - 如果我有 CUE 文件，如何拆分 M4B 文件？
我有一个大的 M4B 文件和一个 CUE 文件。我想将其拆分为多个 M4B 文件，或将其拆分为多个 MP3 文件(以前首选)。我想在命令行中执行此操作(OS X，但如果需要可以使用 Linux)，而
ios - 是否有编译器标志以便链接器警告我这个？ .h 文件，没有 .m 文件
快速提问。我有一个没有实现文件的类的项目。然后在 AppDelegate 我有: #import "AppDelegate.h" #import "SomeClass.h" @interface A

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

audio - 如何使用 ffmpeg 缩短音频文件中的静音？