gpt4 book ai didi

performance - 读取CSV文件时Powershell拦截并修复特定值

转载 作者:行者123 更新时间:2023-12-03 00:24:50 26 4
gpt4 key购买 nike

在PowerShell脚本中,我读取了一个CSV文件。

我必须“修复”一些值。具体来说,CSV可能包含一个空值,即字面NULL或有时为-。所有这些值都应视为$null

有没有办法拦截CSV解析来处理呢?

实际上,我有一个可行的解决方案,但解决方案非常慢。迭代2500多个项目需要20分钟,而读取CSV文件只需几秒钟。

这个想法是迭代每个属性:

$private:result = @{}
foreach($private:prop in $private:line.PSObject.Properties){
$private:value = $null
$private:result.Add($private:prop.Name, ($private:value | Filter-Value))
}
$private:result
...

function Filter-Value{
param(
[Parameter(Position=0, ValueFromPipeline=$true)]
[object]$In
)

if(-not $In){
$null
}
elseif(($In -is [string]) -and ($In.Length -eq 0)) {
$null
}
elseif(($In -eq "NULL") -or ($In -eq "-")) {
$null
}
else{
$In
}
}


完整代码:

function Import-CsvEx{
param(
[Parameter(Mandatory=$true, Position=0, ValueFromPipeline=$true)]
[ValidateScript({Test-Path $_ -PathType Leaf})]
[string]$Path,
[Parameter()]
[string]$Delimiter
)
begin{
Write-Verbose "Begin read of file $Path"
}
process{
# We use file stream and stream reader to automatically detect encoding
$private:fileStream = [System.IO.File]::OpenRead($Path)

$private:streamReader = New-Object System.IO.StreamReader($private:fileStream, [System.Text.Encoding]::Default, $true)

$private:fileContent = $private:streamReader.ReadToEnd()

$private:streamReader.Dispose()
$private:fileStream.Dispose()

$private:csv = ConvertFrom-Csv $private:fileContent -Delimiter $Delimiter

for($private:i=0; $private:i -lt $private:csv.Count ; $private:i++){
Write-Progress -Id 1003 -Activity "Reading CSV" -PercentComplete ($private:i*100/$private:csv.count)
$private:line = $private:csv[$private:i]
$private:result = @{}
foreach($private:prop in $private:line.PSObject.Properties){
$private:value = $null
$private:result.Add($private:prop.Name, ($private:value | Filter-Value))
}

# actually outputs the object to the pipeline
New-Object psobject -Property $private:result

}
Write-Progress -Id 1003 -Activity "Reading CSV" -Completed

}
end{
Write-Verbose "End read of file $Path"
}
}

function Filter-Value{
param(
[Parameter(Position=0, ValueFromPipeline=$true)]
[object]$In
)

if(-not $In){
$null
}
elseif(($In -is [string]) -and ($In.Length -eq 0)) {
$null
}
elseif(($In -eq "NULL") -or ($In -eq "-")) {
$null
}
else{
$In
}
}

最佳答案

鉴于性能是的关注点:

  • 避免使用管道(尽管以将所有数据放入内存为代价)。
  • 避免使用Write-Progress
  • 避免通过.psobject.Properties重复反射。

  • 顺便说一句:很少需要使用 $private:范围,这会使您的代码难以阅读。请注意,仅在函数内部通过名称分配变量会隐式创建局部变量(例如 $var = 42);仅在需要显式防止后代作用域看到这些变量时,才需要 $private:-有关更多信息,请参见 this answer
    # Import the CSV data into a collection in memory.
    # NOTE: In Windows PowerShell, Import-Csv defaults to ASCII(!) encoding.
    # Use -Encoding Default to use the system's ANSI code page, for instance.
    # PowerShell [Core] 6+ consistently defaults to (BOM-less) UTF-8.
    $objects = Import-Csv $Path -Delimiter $Delimiter

    # Extract the property (column) names from the 1st imported object.
    $propNames = $objects[0].psobject.Properties.Name

    # Loop over all objects...
    foreach ($object in $objects) {

    # ... and make the quasi-null properties $null.
    foreach ($propName in $propNames) {
    if ($object.$propName -in '', '-', 'NULL') {
    $object.$propName = $null
    }
    }

    # Output the modified object right away, if desired.
    # Alternatively, operate on the $objects collection later.
    $object

    }

    如果无法将所有数据放入内存中,请使用 Import-Csv ... | ForEach-Object { ... },同时仍仅在脚本块的第一次调用中提取属性名称( { ... })。

    关于performance - 读取CSV文件时Powershell拦截并修复特定值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60622577/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com