gpt4 book ai didi

powershell - 在 Powershell 中使用调用异步复制项目

转载 作者:行者123 更新时间:2023-12-02 23:07:39 27 4
gpt4 key购买 nike

本文介绍如何在 PowerShell 中使用 Invoke-Async:https://sqljana.wordpress.com/2018/03/16/powershell-sql-server-run-in-parallel-collect-sql-results-with-print-output-from-across-your-sql-farm-fast/

我希望在 PowerShell 中并行运行 copy-item cmdlet,因为另一种方法是通过 Excel 使用 FileSystemObject 并从数百万个文件中一次复制一个文件。

我拼凑了以下内容:

.SYNOPSIS
<Brief description>
For examples type:
Get-Help .\<filename>.ps1 -examples
.DESCRIPTION
Copys files from one path to another
.PARAMETER FileList
e.g. C:\path\to\list\of\files\to\copy.txt
.PARAMETER NumCopyThreads
default is 8 (but can be 100 if you want to stress the machine to maximum!)
.EXAMPLE
.\CopyFilesToBackup -filelist C:\path\to\list\of\files\to\copy.txt
.NOTES
#>

[CmdletBinding()]
Param(
[String] $FileList = "C:\temp\copytest.csv",
[int] $NumCopyThreads = 8
)

$filesToCopy = New-Object "System.Collections.Generic.List[fileToCopy]"
$csv = Import-Csv $FileList

foreach($item in $csv)
{
$file = New-Object fileToCopy
$file.SrcFileName = $item.SrcFileName
$file.DestFileName = $item.DestFileName
$filesToCopy.add($file)
}

$sb = [scriptblock] {
param($file)
Copy-item -Path $file.SrcFileName -Destination $file.DestFileName
}
$results = Invoke-Async -Set $filesToCopy -SetParam file -ScriptBlock $sb -Verbose -Measure:$true -ThreadCount 8
$results | Format-Table

Class fileToCopy {
[String]$SrcFileName = ""
[String]$DestFileName = ""
}

csv 输入如下所示:
SrcFileName,DestFileName
C:\Temp\dummy-data\101438\101438-0154723869.zip,\\backupserver\Project Archives\101438\0154723869.zip
C:\Temp\dummy-data\101438\101438-0165498273.xlsx,\\backupserver\Project Archives\101438\0165498273.xlsx

我缺少什么才能使它正常工作,因为当我运行 .\CopyFiles.ps1 -FileList C:\Temp\test.csv 时没有任何 react 。文件存在于源路径中,但未从 -Set 集合中提取文件对象。 (除非我误解了集合是如何使用的?)

不,我不能使用 robocopy 来执行此操作,因为有数百万个文件根据其原始位置解析为不同的路径。

最佳答案

根据您问题中的代码(见底部),我无法解释您的症状,但是 我建议您的解决方案基于(现在)标准 Start-ThreadJob 小命令 (随附 PowerShell Core;在 Windows PowerShell 中,使用 Install-Module ThreadJob -Scope CurrentUser 安装它,例如[1]):
这样的解决方案比使用第三方更有效 Invoke-Async 功能,在撰写本文时,它存在缺陷,因为它等待作业在一个紧密的循环中完成,这会产生不必要的处理开销。
Start-ThreadJob作业是基于进程的轻量级、基于线程的替代方案Start-Job后台作业,但它们与标准作业管理 cmdlet 集成,例如 Wait-JobReceive-Job .
这是一个基于您的代码的独立示例,演示了它的用法:
注:无论您使用 Start-ThreadJobInvoke-Async ,您将无法显式引用自定义类,例如 [fileToCopy]在单独线程中运行的脚本 block 中 (运行空间;见底部),所以下面的解决方案只是使用 [pscustomobject]为简单起见,具有感兴趣的属性的实例。

# Create sample CSV file with 10 rows.
$FileList = Join-Path ([IO.Path]::GetTempPath()) "tmp.$PID.csv"
@'
Foo,SrcFileName,DestFileName,Bar
1,c:\tmp\a,\\server\share\a,baz
2,c:\tmp\b,\\server\share\b,baz
3,c:\tmp\c,\\server\share\c,baz
4,c:\tmp\d,\\server\share\d,baz
5,c:\tmp\e,\\server\share\e,baz
6,c:\tmp\f,\\server\share\f,baz
7,c:\tmp\g,\\server\share\g,baz
8,c:\tmp\h,\\server\share\h,baz
9,c:\tmp\i,\\server\share\i,baz
10,c:\tmp\j,\\server\share\j,baz
'@ | Set-Content $FileList

# How many threads at most to run concurrently.
$NumCopyThreads = 8

Write-Host 'Creating jobs...'
$dtStart = [datetime]::UtcNow

# Import the CSV data and transform it to [pscustomobject] instances
# with only .SrcFileName and .DestFileName properties - they take
# the place of your original [fileToCopy] instances.
$jobs = Import-Csv $FileList | Select-Object SrcFileName, DestFileName |
ForEach-Object {
# Start the thread job for the file pair at hand.
Start-ThreadJob -ThrottleLimit $NumCopyThreads -ArgumentList $_ {
param($f)
$simulatedRuntimeMs = 2000 # How long each job (thread) should run for.
# Delay output for a random period.
$randomSleepPeriodMs = Get-Random -Minimum 100 -Maximum $simulatedRuntimeMs
Start-Sleep -Milliseconds $randomSleepPeriodMs
# Produce output.
"Copied $($f.SrcFileName) to $($f.DestFileName)"
# Wait for the remainder of the simulated runtime.
Start-Sleep -Milliseconds ($simulatedRuntimeMs - $randomSleepPeriodMs)
}
}

Write-Host "Waiting for $($jobs.Count) jobs to complete..."

# Synchronously wait for all jobs (threads) to finish and output their results
# *as they become available*, then remove the jobs.
# NOTE: Output will typically NOT be in input order.
Receive-Job -Job $jobs -Wait -AutoRemoveJob
Write-Host "Total time lapsed: $([datetime]::UtcNow - $dtStart)"

# Clean up the temp. file
Remove-Item $FileList
上面的结果如下:
Creating jobs...
Waiting for 10 jobs to complete...
Copied c:\tmp\b to \\server\share\b
Copied c:\tmp\g to \\server\share\g
Copied c:\tmp\d to \\server\share\d
Copied c:\tmp\f to \\server\share\f
Copied c:\tmp\e to \\server\share\e
Copied c:\tmp\h to \\server\share\h
Copied c:\tmp\c to \\server\share\c
Copied c:\tmp\a to \\server\share\a
Copied c:\tmp\j to \\server\share\j
Copied c:\tmp\i to \\server\share\i
Total time lapsed: 00:00:05.1961541
请注意,接收到的输出不反射(reflect)输入顺序,并且总体运行时间大约是每线程运行时间 2 秒(加上开销)的 2 倍,因为由于输入计数为 10,因此必须运行 2 个“批处理” ,而只有 8 个线程可用。
如果您将线程数增加到 10 或更多(默认为 50),则整体运行时间将下降到 2 秒加上开销,因为所有作业随后同时运行。
警告 :以上数字源于在 PowerShell Core 中运行,Microsoft Windows 10 Pro(64 位;版本 1903)版本,使用 ThreadJob 的版本 2.0.1模块。
令人费解的是,相同的代码在 Windows PowerShell v5.1.18362.145 中要慢得多。

但是,对于性能和内存消耗,它是 在您的情况下最好使用批处理(分 block ),即每个线程处理多个文件对 .
以下解决方案演示了这种方法;调整 $chunkSize找到适合您的批量大小。
# Create sample CSV file with 10 rows.
$FileList = Join-Path ([IO.Path]::GetTempPath()) "tmp.$PID.csv"
@'
Foo,SrcFileName,DestFileName,Bar
1,c:\tmp\a,\\server\share\a,baz
2,c:\tmp\b,\\server\share\b,baz
3,c:\tmp\c,\\server\share\c,baz
4,c:\tmp\d,\\server\share\d,baz
5,c:\tmp\e,\\server\share\e,baz
6,c:\tmp\f,\\server\share\f,baz
7,c:\tmp\g,\\server\share\g,baz
8,c:\tmp\h,\\server\share\h,baz
9,c:\tmp\i,\\server\share\i,baz
10,c:\tmp\j,\\server\share\j,baz
'@ | Set-Content $FileList

# How many threads at most to run concurrently.
$NumCopyThreads = 8

# How many files to process per thread
$chunkSize = 3

# The script block to run in each thread, which now receives a
# $chunkSize-sized *array* of file pairs.
$jobScriptBlock = {
param([pscustomobject[]] $filePairs)
$simulatedRuntimeMs = 2000 # How long each job (thread) should run for.
# Delay output for a random period.
$randomSleepPeriodMs = Get-Random -Minimum 100 -Maximum $simulatedRuntimeMs
Start-Sleep -Milliseconds $randomSleepPeriodMs
# Produce output for each pair.
foreach ($filePair in $filePairs) {
"Copied $($filePair.SrcFileName) to $($filePair.DestFileName)"
}
# Wait for the remainder of the simulated runtime.
Start-Sleep -Milliseconds ($simulatedRuntimeMs - $randomSleepPeriodMs)
}

Write-Host 'Creating jobs...'
$dtStart = [datetime]::UtcNow

$jobs = & {

# Process the input objects in chunks.
$i = 0
$chunk = [pscustomobject[]]::new($chunkSize)
Import-Csv $FileList | Select-Object SrcFileName, DestFileName | ForEach-Object {
$chunk[$i % $chunkSize] = $_
if (++$i % $chunkSize -ne 0) { return }
# Note the need to wrap $chunk in a single-element helper array (, $chunk)
# to ensure that it is passed *as a whole* to the script block.
Start-ThreadJob -ThrottleLimit $NumCopyThreads -ArgumentList (, $chunk) -ScriptBlock $jobScriptBlock
$chunk = [pscustomobject[]]::new($chunkSize) # we must create a new array
}

# Process any remaining objects.
# Note: $chunk -ne $null returns those elements in $chunk, if any, that are non-null
if ($remainingChunk = $chunk -ne $null) {
Start-ThreadJob -ThrottleLimit $NumCopyThreads -ArgumentList (, $remainingChunk) -ScriptBlock $jobScriptBlock
}

}

Write-Host "Waiting for $($jobs.Count) jobs to complete..."

# Synchronously wait for all jobs (threads) to finish and output their results
# *as they become available*, then remove the jobs.
# NOTE: Output will typically NOT be in input order.
Receive-Job -Job $jobs -Wait -AutoRemoveJob
Write-Host "Total time lapsed: $([datetime]::UtcNow - $dtStart)"

# Clean up the temp. file
Remove-Item $FileList

虽然输出实际上是相同的,但请注意这次只创建了 4 个作业,每个作业都已处理(最多) $chunkSize ( 3 ) 文件对。

至于 你尝试了什么 :
您显示的屏幕截图表明问题在于您的自定义类 [fileToCopy] , 对 Invoke-Async 运行的脚本 block 不可见.
由于 Invoke-Async通过 PowerShell SDK 在单独的运行空间中调用脚本 block ,这些运行空间对调用者的状态一无所知,预计这些运行空间不知道您的类(这同样适用于 Start-ThreadJob )。
但是,不清楚为什么这是您的代码中的问题,因为您的脚本 block 没有明确引用您的类:您的脚本 block 参数 $file不受类型限制(隐含 [object] 类型)。
因此,只需在脚本 block 中访问自定义类实例的属性就可以了,并且在我对 Microsoft Windows 10 Pro(64 位;版本 1903)上的 Windows PowerShell v5.1.18362.145 的测试中确实如此。
但是, 如果您真正的脚本 block 代码要显式引用自定义类 [fileToCopy] - 例如通过将参数定义为 param([fileToToCopy] $file) - 你会看到症状 .

[1] 在不附带 PowerShellGet 的 Windows PowerShell v3 和 v4 中模块, Install-Module默认情况下不可用。但是,该模块可以按需安装,如 Installing PowerShellGet 中所述。 .

关于powershell - 在 Powershell 中使用调用异步复制项目,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57675553/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com