gpt4 book ai didi

powershell - 自定义Powershell排序功能

转载 作者:行者123 更新时间:2023-12-03 07:55:33 25 4
gpt4 key购买 nike

我有一个包含 1M+ 名称的巨大数组,有些是字母数字,有些只是字母。

CSV:
id,firstname,lastname,email,email2,profession
100,Andeee,Michella,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bbfad5dfdedede95f6d2d8d3ded7d7dafbc2d4cbd6dad2d795d8d4d6" rel="noreferrer noopener nofollow">[email protected]</a>,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ca8ba4aeafafafe487a3a9a2afa6a6ab8aada7aba3a6e4a9a5a7" rel="noreferrer noopener nofollow">[email protected]</a>,police officer
101,Tybie,1Grobe,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="590d203b303c771e2b363b3c1920362934383035773a3634" rel="noreferrer noopener nofollow">[email protected]</a>,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="fbaf8299929ed5bc8994999ebb9c969a9297d5989496" rel="noreferrer noopener nofollow">[email protected]</a>,worker
102,Fernande,Azeria,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d197b4a3bfb0bfb5b4ff90abb4a3b8b091a8bea1bcb0b8bdffb2bebc" rel="noreferrer noopener nofollow">[email protected]</a>,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="440221362a252a20216a053e21362d25042329252d286a272b29" rel="noreferrer noopener nofollow">[email protected]</a>,developer
103,Lenna,Schenck,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="82cee7ecece3acd1e1eae7ece1e9c2fbedf2efe3ebeeace1edef" rel="noreferrer noopener nofollow">[email protected]</a>,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="aae6cfc4c4cb84f9c9c2cfc4c9c1eacdc7cbc3c684c9c5c7" rel="noreferrer noopener nofollow">[email protected]</a>,police officer
104,4Marti,Brittani,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bbf6dac9cfd295f9c9d2cfcfdad5d2fbc2d4cbd6dad2d795d8d4d6" rel="noreferrer noopener nofollow">[email protected]</a>,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6825091a1c01462a1a011c1c090601280f05090104460b0705" rel="noreferrer noopener nofollow">[email protected]</a>,worker
105,Riannon,Aldric,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="88dae1e9e6e6e7e6a6c9e4ecfae1ebc8f1e7f8e5e9e1e4a6ebe7e5" rel="noreferrer noopener nofollow">[email protected]</a>,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4c1e252d22222322620d20283e252f0c2b212d2520622f2321" rel="noreferrer noopener nofollow">[email protected]</a>,doctor
106,Corry,Nikaniki,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="cd8ea2bfbfb4e383a4a6aca3a4a6a48db4a2bda0aca4a1e3aea2a0" rel="noreferrer noopener nofollow">[email protected]</a>,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="65260a17171c4b2b0c0e040b0c0e0c250208040c094b060a08" rel="noreferrer noopener nofollow">[email protected]</a>,worker
107,Correy,Shama,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6d2e021f1f0814433e050c000c2d14021d000c0401430e0200" rel="noreferrer noopener nofollow">[email protected]</a>,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ca89a5b8b8afb3e499a2aba7ab8aada7aba3a6e4a9a5a7" rel="noreferrer noopener nofollow">[email protected]</a>,police officer
108,Marcy,Drus,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="fcb19d8e9f85d2b88e898fbc85938c919d9590d29f9391" rel="noreferrer noopener nofollow">[email protected]</a>,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0e436f7c6d77204a7c7b7d4e69636f6762206d6163" rel="noreferrer noopener nofollow">[email protected]</a>,worker
109,Bill,Valerio,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="aae8c3c6c684fccbc6cfd8c3c5ead3c5dac7cbc3c684c9c5c7" rel="noreferrer noopener nofollow">[email protected]</a>,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e5a78c8989cbb3848980978c8aa58288848c89cb868a88" rel="noreferrer noopener nofollow">[email protected]</a>,worker

我不想对整个数组使用 Sort-Oject 或 Sort,因为它花费的时间太长。由于我的环境限制,这需要在 Powershell 中完成。

该数组来 self 从另一个 powershell 作业导出的 csv。 (我无权访问职位代码,只能访问结果)。

这是我根据找到的 Java 方法创建的示例。它失败并出现以下错误:由于调用深度溢出,脚本失败。

$array = @("Ryan", "Kelly", "Alex", "Kyle", "Riley")
mergeSort $array

write-host $array

function mergeSort
{

param([string[]] $arr)

if($arr.length -ge 2){

#find mid-point
$left_len = [math]::floor([int32]$arr.length/2)

#declare array size of left of mid-point
$left = [string[]]::new($left_len);

#find mid-point
$right_len = [math]::ceiling([int32]($arr.Length - $arr.Length /2))

#declare array size right of mid-point
$right = [string[]]::new($right_len);

for ($i = 0; $i -lt $left_len.Length; $i++){
$left= $arr[$i]
}#for loop

for ($i = 0; $i -lt $right_len; $i++){
$right = $arr[$i +$arr.Length/2]
}
}#if($arr.length -ge 2)

mergeSort $left

mergeSort $right

merge ($arr, $left, $right)
}

function merge
{
param ([string[]] $result,[string[]] $left, [string[]] $right)

$i1 = 0

$12 = 0

for ($i = 0; $i -le $result.Length; $i++) {
if($i2 -gt $right.Length -or ($i1 -lt $left.Length -and $left[$i1].CompareTo($right[$i2]) -lt 0)) {
$result[$i] = $left[$i1]
$i1++
}
else {
$result[$i] = $right[$i2]
$i2++
}

}

$result.legnth

}

这是我根据每个人的建议提出的最新解决方案:我想让这个并行工作,但它会抛出错误:

$array = @('Ryan', 'Kelly', 'Alex', 'Kyle', 'Riley', '4test', 'test4', 'why', 'you', 'me', 'where', 'hello', 'jose', 'test', 
'Jelly', 'Plex', 'Cyle', 'Miley', '5test', '3test4', 'who', 'Bou', 'We', 'There', 'Yellow', 'Pose', 'West')

$type = ("System.Collections.Generic.SortedSet"+'`'+"2") -as "Type"
$type = $type.MakeGenericType( @( ("System.string" -as "Type"), ("system.string" -as "Type") ) )
$sortedArray = [Activator]::CreateInstance($type, 10000)

$a, $b = ($array | Split-Collection -Count ([int]$array.length/2) | %{ $_ -join ',' })

$firstCollection = $a.Split(",")
$secondCollection = $b.Split(",")

$counter = 0
$counterHalf = $array.Length/2

1..$counterHalf| ForEach {
try {
$col1 = $firstCollection[$counter]
$sortedArray.Add($col1, $counter)
}
catch { "Out of bound col 1" }

try {
$col2 = $secondCollection[$counter]
$sortedArray.Add($col2, $counter)
}
catch { "Out of bound col 2" }

$counter++
}

$sortedArray


function Split-Collection {
[CmdletBinding()]
param(
[Parameter(ValueFromPipeline=$true)] $Collection,
[Parameter(Mandatory=$true)][ValidateRange(1, 247483647)][int] $Count)
begin {
$Ctr = 0
$Arrays = @()
$TempArray = @()
}
process {
if (++$Ctr -eq $Count) {
$Ctr = 0
$Arrays += , @($TempArray + $_)
$TempArray = @()
return
}
$TempArray += $_
}
end {
if ($TempArray) { $Arrays += , $TempArray }
$Arrays
}
}

最佳答案

FWIW,这是关于获取 Merge Sort 的原始问题的答案代码工作。不幸的是,它的性能不是很好,所以我不知道它是否真的能帮助您解决对 1M+ 行进行排序的更广泛问题...

好消息

我对您的原始 mergeSort 进行了一些调整,似乎可以修复它,至少对于问题顶部的示例数组是这样。

修复的大部分是打字错误 - 请参阅 BeyondCompare 的屏幕截图来查看我所做的更改:

enter image description here

坏消息

太慢了。

PS> $array = [string[]] (1..10000 | % { $_.ToString() }) 
PS> measure-command {
mergeSort $array
}

...
TotalMilliseconds : 11511.74

相比

PS> $array = [string[]] (1..10000 | % { $_.ToString() }) 
PS> measure-command {
$array = $array | sort-object
}

...
TotalMilliseconds : 36.8607

也许它在你所说的数据规模下表现更好,但我没有耐心测试它:-)

丑陋

我还稍微调整了代码,以便在对右侧进行任何操作之前先对左侧进行排序,这意味着它不需要使用太多内存。

这是更新后的示例,供后代使用。

$ErrorActionPreference = "Stop";
Set-StrictMode -Version "Latest";

function mergeSort
{

param([string[]] $arr)

if( $arr.length -gt 1 )
{

# sort the left side
$left_len = [Math]::Floor([int32]$arr.length / 2)
$left = [string[]]::new($left_len);
for( $i = 0; $i -lt $left_len; $i++ )
{
$left[$i] = $arr[$i]
}
mergeSort -arr $left

# sort the right side
$right_len = $arr.Length - $left_len
$right = [string[]]::new($right_len);
for( $i = 0; $i -lt $right_len; $i++ )
{
$right[$i] = $arr[$left_len + $i]
}
mergeSort -arr $right

# merge the two sides
merge -result $arr -left $left -right $right

}

}

function merge
{
param ([string[]] $result,[string[]] $left, [string[]] $right)

$i1 = 0
$i2 = 0

for ($i = 0; $i -lt $result.Length; $i++)
{
if( ($i1 -lt $left.Length) -and (($i2 -ge $right.Length) -or $left[$i1].CompareTo($right[$i2]) -lt 0) )
{
$result[$i] = $left[$i1]
$i1++
}
else
{
$result[$i] = $right[$i2]
$i2++
}
}

}

$array = [string[]] @("Ryan", "Kelly", "Alex", "Kyle", "Riley")
mergeSort $array

write-host $array

需要特别指出的一件事是将输入数组转换为字符串:

$array = [string[]] @("Ryan", "Kelly", "Alex", "Kyle", "Riley")

如果没有强制转换,$array 的类型为 [System.Object[]],并且 PowerShell 将创建一个新的临时 [string[]] 内部数组,将值复制到其中,然后对内部数组进行排序,但它不会将内部数组分配回 $array .

在没有 Actor 的情况下尝试一下,看看效果如何。

关于powershell - 自定义Powershell排序功能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76133654/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com