gpt4 book ai didi

c# - 替换CSV文件中的列逗号分隔符,并使用值前后的单引号处理字段

转载 作者:行者123 更新时间:2023-12-03 00:58:10 25 4
gpt4 key购买 nike

系统正在生成一个我没有影响的csv文件。

如果数据本身包含逗号,则可以在两列中将值括在一对单引号中。

示例数据-4列

123,'abc,def,ghf',ajajaj,1 
345,abdf,'abc,def,ghi',2
556,abdf,def,3
999,'a,b,d','d,e,f',4

结果我想使用Powershell ...

不在数据中的逗号-表示分隔字段的那些逗号将替换为指定的定界符(在pipe-star下方的情况下)。一对单引号之间的逗号仍然保留为逗号。

结果
123|*'abc,def,ghf'|*ajajaj|*1 
345|*abdf|*'abc,def,ghi'|*2
556|*abdf|*def|*3
999|*'a,b,d'|*'d,e,f'|*4

如果可能,我想使用reg表达式来执行power-shell或C#net,但是我不知道该怎么做。

最佳答案

尽管我认为这会创建格式异常的CSV文件,但是使用PowerShell可以将switch-Regex-File参数一起使用。这可能是处理大文件的最快方法,并且只需要几行代码:

# create a regex that will find comma's unless they are inside single quotes
$commaUnlessQuoted = ",(?=([^']*'[^']*')*[^']*$)"

$result = switch -Regex -File 'D:\test.csv' {
# added -replace "'" to also remove the single quotes as commented
default { $_ -replace "$commaUnlessQuoted", '|*' -replace "'" }
}

# output to console
$result

# output to new (sort-of) CSV file
$result | Set-Content -Path 'D:\testoutput.csv'

更新

作为 mklement0 pointed out,上面的代码可以完成工作,但是在将更新的数据创建为内存 中的数组的过程中,完全将写入写入输出文件。
如果这是一个问题(文件太大而无法容纳可用的内存),您也可以更改代码以读取/替换原始行,然后将该行立即写到输出文件中。

下一种方法几乎不会耗尽任何内存,但是当然要在磁盘上执行更多写操作。
# make sure this is an absolute path for .NET
$outputFile = 'D:\output.csv'
$inputFile = 'D:\input.csv'

# create a regex that will find comma's unless they are inside single quotes
$commaUnlessQuoted = ",(?=([^']*'[^']*')*[^']*$)"

# create a StreamWriter object. Uses UTF8Encoding without BOM (Byte Order Mark) by default.
# if you need a different encoding for the output file, use for instance
# $writer = [System.IO.StreamWriter]::new($outputFile, $false, [System.Text.Encoding]::Unicode)
$writer = [System.IO.StreamWriter]::new($outputFile)
switch -Regex -File $inputFile {
default {
# added -replace "'" to also remove the single quotes as commented
$line = $_ -replace "$commaUnlessQuoted", '|*' -replace "'"
$writer.WriteLine($line)
# if you want, uncomment the next line to show on console
# $line
}
}

# remove the StreamWriter object from memory when done
$writer.Dispose()

结果:

123|*abc,def,ghf|*ajajaj|*1 
345|*abdf|*abc,def,ghi|*2
556|*abdf|*def|*3
999|*a,b,d|*d,e,f|*4


正则表达式详细信息:
,                 Match the character “,” literally
(?= Assert that the regex below can be matched, starting at this position (positive lookahead)
( Match the regular expression below and capture its match into backreference number 1
[^'] Match any character that is NOT a “'”
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
' Match the character “'” literally
[^'] Match any character that is NOT a “'”
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
' Match the character “'” literally
)* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
[^'] Match any character that is NOT a “'”
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ Assert position at the end of the string (or before the line break at the end of the string, if any)
)

关于c# - 替换CSV文件中的列逗号分隔符,并使用值前后的单引号处理字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60006585/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com