gpt4 book ai didi

regex - 跳过高性能 Powershell 正则表达式脚本 block 中的标题行

转载 作者:行者123 更新时间:2023-12-03 01:30:19 25 4
gpt4 key购买 nike

我从 Stack Overflow 那里得到了一些惊人的帮助……但是……太神奇了,我需要更多的帮助才能更接近终点线。我每月 2 次解析多个巨大的 4GB 文件。我需要能够跳过标题、计算总行数、匹配行和不匹配行。我确信这对于 PowerShell super 巨星来说非常简单,但在我的新手 PS 水平上,我的技能还不强。也许你的一点帮助可以节省一周的时间。 :)

数据样本:

ID         FIRST_NAME              LAST_NAME          COLUMN_NM_TOO_LON5THCOLUMN
10000000001MINNIE MOUSE COLUMN VALUE LONGSTARTS
10000000002MICKLE ROONEY MOUSE COLUMN VALUE LONGSTARTS

代码块(基于 this answer ):
#$match_regex matches each fixed length field by length; the () specifies that each matched field be stored in a capture group:
[regex]$match_regex = '^(.{10})(.{50})(.{50})(.{50})(.{50})(.{3})(.{8})(.{4})(.{50})(.{2})(.{30})(.{6})(.{3})(.{4})(.{25})(.{2})(.{10})(.{3})(.{8})(.{4})(.{50})(.{2})(.{30})(.{6})(.{3})(.{2})(.{25})(.{2})(.{10})(.{3})(.{10})(.{10})(.{10})(.{2})(.{10})(.{50})(.{50})(.{50})(.{50})(.{8})(.{4})(.{50})(.{2})(.{30})(.{6})(.{3})(.{2})(.{25})(.{2})(.{10})(.{3})(.{4})(.{2})(.{4})(.{10})(.{38})(.{38})(.{15})(.{1})(.{10})(.{2})(.{10})(.{10})(.{10})(.{10})(.{38})(.{38})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})$'

Measure-Command {
& {
switch -File $infile -Regex {
$match_regex {
# Join what all the capture groups matched with a tab char.
$Matches[1..($Matches.Count-1)].Trim() -join "`t"
}
}
} | Out-File $outFile
}

最佳答案

您只需要跟踪两个计数 - 匹配的行和不匹配的行 - 然后是一个 bool 值来指示您是否跳过了第一行

$first = $false
$matched = 0
$unmatched = 0
. {
switch -File $infile -Regex {
$match_regex {
if($first){
# Join what all the capture groups matched with a tab char.
$Matches[1..($Matches.Count-1)].Trim() -join "`t"
$matched++
}
$first = $true
}
default{
$unmatched++
# you can remove this, if the pattern always matches the header
$first = $true
}
}
} | Out-File $outFile

$total = $matched + $unmatched

关于regex - 跳过高性能 Powershell 正则表达式脚本 block 中的标题行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58845899/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com