gpt4 book ai didi

python - 大型CSV文件处理

转载 作者:行者123 更新时间:2023-12-03 02:37:16 24 4
gpt4 key购买 nike

可能这可能是重复的问题。我进行了很多搜索,但没有找到答案。

我正在进行音频分析,其中需要将主要音频文件分解成多个音频文件,每个音频文件应包含一个句子。

当我花费大约3秒钟的音频,而我正在执行的过程大约需要5分钟,但是如果音频长度更大,则完成该过程所需的时间就会大大增加。例如5分钟的音频大约需要14个小时。
首先,我创建了一个包含时间(以秒为单位)与振幅的csv文件,然后采用了一个阈值,其中小于i的振幅将为0,大于1的振幅将为1。后来,我检查连续0的数目是否大于另一个阈值j然后花费那个位置的时间。因此,我找到了结束句子的时间。
此过程耗时太长,因此任何其他方法都将有所帮助。

我的数据集如下:

1.000000000000000000e+00,0.000000000000000000e+00,6.103515625000000000e+01
2.000000000000000000e+00,2.267999999999999969e-05,3.051757811999999959e+01
3.000000000000000000e+00,4.534999999999999779e-05,0.000000000000000000e+00
4.000000000000000000e+00,6.802999999999999748e-05,3.051757811999999959e+01
5.000000000000000000e+00,9.069999999999999558e-05,3.051757811999999959e+01
6.000000000000000000e+00,1.133800000000000020e-04,0.000000000000000000e+00
7.000000000000000000e+00,1.360500000000000001e-04,0.000000000000000000e+00
8.000000000000000000e+00,1.587299999999999931e-04,0.000000000000000000e+00
9.000000000000000000e+00,1.814100000000000131e-04,0.000000000000000000e+00
1.000000000000000000e+01,2.040800000000000112e-04,0.000000000000000000e+00
1.100000000000000000e+01,2.267600000000000041e-04,0.000000000000000000e+00
1.200000000000000000e+01,2.494299999999999751e-04,3.051757811999999959e+01
1.300000000000000000e+01,2.721099999999999951e-04,0.000000000000000000e+00
1.400000000000000000e+01,2.947800000000000203e-04,0.000000000000000000e+00
1.500000000000000000e+01,3.174599999999999861e-04,0.000000000000000000e+00
1.600000000000000000e+01,3.401400000000000061e-04,3.051757811999999959e+01
1.700000000000000000e+01,3.628099999999999771e-04,0.000000000000000000e+00
1.800000000000000000e+01,3.854899999999999972e-04,3.051757811999999959e+01
1.900000000000000000e+01,4.081600000000000224e-04,0.000000000000000000e+00
2.000000000000000000e+01,4.308399999999999882e-04,0.000000000000000000e+00
2.100000000000000000e+01,4.535100000000000134e-04,3.051757811999999959e+01

从CSV文件复制。行数为415449。我只给了您20。我需要检查第一列是否在(1,2,3,...,n)中。当系列中断时,我需要在系列的最后一个数字处获取第2列的值。希望我能解决我的问题

请注意:我需要在shell,python,C,C++中进行硬编码。

最佳答案

现在,我对您有所了解了,使用awk:

awk -v n=6 '            # n as parameter
BEGIN {
FS="," # comma as the field separator
}
int($1)==$1 && $1<=n { # if $1 is an integer less than or equal to n
val=$2 # value of column 2 at the last number of series
}
END {
print val # output the value
}' file
1.133800000000000020e-04

更新:
$ awk -v i=1 -v j=0 -v k=3 '
BEGIN {
FS=","
}
$3<i { # if the value of 3rd column is less than "i"
j++ # then "j" will increment by 1
}
j>k { # when the value of "j" is greater than a value "k"
print $1 # It will print the column 1 value
# exit # uncomment this if it only needs to print one value
}' file
8.000000000000000000e+00
9.000000000000000000e+00
1.000000000000000000e+01
1.100000000000000000e+01
1.200000000000000000e+01
...

关于python - 大型CSV文件处理,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52646106/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com