gpt4 book ai didi

powershell - ConvertFrom-String 模板存在不规则数据集问题

转载 作者:行者123 更新时间:2023-11-30 09:08:20 27 4
gpt4 key购买 nike

我有一个数据集,正在尝试将其标准化为 PsCustomObject。我一直在尝试使用 ConvertFrom-String 的机器学习模板功能,并取得了部分成功。一个问题是我能找到的所有示例都具有相同结构的数据集。我的并不都一样。

我确信智者可以直接从原始数据中做到这一点,但我对它进行了一些操作以达到我现在的水平。

原始样本数据:

IDE00001-ENG99061-Production mode-Access control
IDE00001-ENG115730-Production mode-Aussenbeleuchtung
IDE00001-ENG112304-Production mode-Heckwischer
IDE00001-ENG98647-Production mode-Interior lighting
IDE00001-ENG115729-Production mode-Scheinwerferreinigung
IDE00001-ENG115731-Production mode-Virtuel_pedal
IDE00002-Transport mode
IDE00820-Activating and deactivating all development messages
IDE01550-Service position
IDE02152-Characteristics in production mode
IDE02269-MAS04382-Acknowledgement signals-Optical feedback during locking
IDE02332-Deactivate production mode
IDE02488-DWA Interior monitoring
IDE02711-ENG116690-Rear Window Wiper-Automatisches Heckwischen

使用以下脚本:

$lines = $testText.Split("`n") #$testText is the above data wrapped in a here-string
$NewLines = @()
foreach($line in $lines)
{
[regex]$regex = '-'
$HyphenCount = $regex.Matches($line).count
#$HyphenCount
switch ($HyphenCount)
{
1{
$newLines += $line -replace "-",","
}
2{
$split = $line.Split("-",2)
$newlines += $split -join ","
}
3{
if($line.Contains("mode-"))
{
#$line
$split = $line.Split("-",4)
$newlines += $split -join ","
}
else
{
$split = $line.Split("-",3)
$newlines += $split -join ","
}
}
4{
$split = $line.Split("-",3) #this assumes the fourth hyphen is part of description
$newlines += $split -join ","
}
5{
$split = $line.Split("-",4)
$newlines += $split -join ","
}
}
}

处理的数据集:

我得到的原始数据如下:

IDE00001,ENG99061,Production mode,Access control
IDE00001,ENG115730,Production mode,Aussenbeleuchtung
IDE00001,ENG112304,Production mode,Heckwischer
IDE00001,ENG98647,Production mode,Interior lighting
IDE00001,ENG115729,Production mode,Scheinwerferreinigung
IDE00001,ENG115731,Production mode,Virtuel_pedal
IDE00002,Transport mode
IDE00820,Activating and deactivating all development messages
IDE01550,Service position
IDE02152,Characteristics in production mode
IDE02269,MAS04382,Acknowledgement signals-Optical feedback during locking
IDE02332,Deactivate production mode
IDE02488,DWA Interior monitoring
IDE02711,ENG116690,Rear Window Wiper-Automatisches Heckwischen
IDE99999,Test-two hyphens
IDE99999,ENG123456,Test-four-Hyphens
IDE99999,ENG123456,Production mode,test-five-hyphens

通过以下模板传递上述数据已经让我接近了我所需要的,但它仍然存在一些问题:

$template = @'
{object*:{ide:IDE00001},{code?:ENG99061},{mode?:Production mode},{description?:Access control}}
{object*:{ide:IDE00001},{code?:ENG115730},{mode?:Dev mode},{description?:Aussenbeleuchtung}}
{object*:{ide:IDE00001},{code?:ENG115731},{mode?:Production mode},{description?:Virtuel_pedal}}
{object*:{ide:IDE02711},{code?:ENG116690},{description?:Rear Window Wiper-Automatisches Heckwischen}}
{object*:{ide:IDE00820},{description?:{!mode?:{!code?:Activating and deactivating all development messages}}}}
{object*:{ide:IDE01550},{description?:{!mode?:{!code?:Service position}}}}
{object*:{ide:IDE02488},{description?:{!mode?:{!code?:DWA Interior monitoring}}}}
{object*:{ide:IDE00002},{mode?:Transport mode}}
'@

$testText | ConvertFrom-String -TemplateContent $template -OutVariable out | Out-Null
$out.object

迄今为止的结果:

结果如下:

ide      code      mode            description                                            
--- ---- ---- -----------
IDE00001 ENG99061 Production mode Access control
IDE00001 ENG115730 Production mode Aussenbeleuchtung
IDE00001 ENG112304 Production mode Heckwischer
IDE00001 ENG98647 Production mode Interior lighting
IDE00001 ENG115729 Production mode Scheinwerferreinigung
IDE00001 ENG115731 Production mode Virtuel_pedal
IDE00002 Transport mode Transport mode
IDE00820 Activating and deactivating all development messages
IDE01550 Service position
IDE02152 production mode Characteristics in production mode
IDE02269 MAS04382 Acknowledgement signals-Optical feedback during locking
IDE02332 production mode Deactivate production mode
IDE02488 DWA Interior monitoring
IDE02711 ENG116690 Rear Window Wiper-Automatisches Heckwischen
IDE99999 Test-two hyphens
IDE99999 ENG123456 Test-four-Hyphens

问题领域:

IDE00002           Transport mode  Transport mode

IDE02152 production mode Characteristics in production mode

IDE02332 production mode Deactivate production mode
  1. 传输模式不应出现在描述列中。
  2. 生产模式不应出现在模式列中。它以某种方式从描述中获取了这一点。

我实在是想不出来。因此,如果有人有任何想法......

最佳答案

作为替代方案,如果您的输入数据足够系统,您可以使用正则表达式对其进行解析:

$inputText = @"
IDE00001-ENG99061-Production mode-Access control
IDE00001-ENG115730-Production mode-Aussenbeleuchtung
IDE00001-ENG112304-Production mode-Heckwischer
IDE00001-ENG98647-Production mode-Interior lighting
IDE00001-ENG115729-Production mode-Scheinwerferreinigung
IDE00001-ENG115731-Production mode-Virtuel_pedal
IDE00002-Transport mode
IDE00820-Activating and deactivating all development messages
IDE01550-Service position
IDE02152-Characteristics in production mode
IDE02269-MAS04382-Acknowledgement signals-Optical feedback during locking
IDE02332-Deactivate production mode
IDE02488-DWA Interior monitoring
IDE02711-ENG116690-Rear Window Wiper-Automatisches Heckwischen
"@ -split "`n"

$pattern = '^((?<ide>[IDE0-9]+)-)((?<code>[A-Z0-9]+)-)?((?<mode>Production mode|Transport mode)-?)?(?<description>.*?)$'

foreach ($line in $inputText)
{
$isMatch = $line -match $pattern
if (-not $isMatch)
{
Write-Warning "Cannot parse expression: $line"
continue
}

New-Object psobject -Property ([ordered]@{
'Ide' = $Matches.ide
'Code' = $Matches.code
'Mode' = $Matches.mode
'Description' = $Matches.description
})
}

您说过您的数据的结构不一样。也许您的正则表达式需要比上面给出的更复杂。或者,如果您可以识别可能出现的所有不同结构,则可以使用不同的正则表达式多次运行解析。

关于powershell - ConvertFrom-String 模板存在不规则数据集问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46721012/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com