gpt4 book ai didi

regex - 是否可以改进此正则表达式以在 Excel 单元格中查找编号的文本行以避免错误匹配?

转载 作者:行者123 更新时间:2023-12-04 22:22:12 25 4
gpt4 key购买 nike

我有一个很大的电子表格,其中一些单元格可能包含多行文本,有的有编号,有的没有。我的目标是将这些单独编号的“项目”提取到单独的单元格中。

例如,输入单元格可能包含类似这样的内容(在“s”之间):

“1.甲方完成。
2./3.乙方按图纸 805/12 施工。
使用 ITP 675/24。

4.丙方拟订婚。”

请注意,项目编号从一行的开头开始,或者使用“/”跟在这样的后面。数字后面总是跟一个“.”。 (点)。点后面可能有一些空格或没有空格,然后项目的文本可能会分布在多行中。

在上述输入单元上运行,所需的输出将是:

单元格 1:“1。甲方完成。”
单元格 2:“2. B 方按照图纸 805/12 build 。
使用 ITP 675/24。”
单元格 3:“3. 乙方按照图纸 805/12 build 。
使用 ITP 675/24。”
单元格 4:“4. C 方参与。”

我一直在 VBA 中使用 RegExp 类对象,如下所示。这使我可以查明项目的开始,然后提取这些点之间的文本(或字符串的结尾):

Dim RegExObj1 As RegExp
Dim mc1 As MatchCollection

Set RegExObj1 = New RegExp

With RegExObj1
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = "(^|/)(\d+)\."
End With

Set mc1 = RegExObj1.Execute(CleanedCellText)

这通常有效,但我得到了不需要的匹配项,例如“/12”。和“/24.”,从行尾开始。如何更改正则表达式以排除这些?

请注意,我捕获“/”的出现以确定项目编号是否需要从下一个编号继承文本。在这种情况下,项目 2 继承了项目 3 的文本。但我不确定是否有更好的方法来应对这一挑战。

最佳答案

给定您的数据,像 (?:\d+\.\/)|(?:\d+\.[\s\S]+?(?=(?:\x0A+\d+\.)|$)) 这样的模式将收集每行的开头(编号段)和行的其余部分(编号段)。
如果行号后跟./ , 它只收集这些,因此您可以通过测试最右边的字符是否为 / 来判断是否需要填写.在我们填充结果数组后,我们从下到上循环遍历它并决定我们需要在哪里填充空白。
所以这是另一种方法,使用正则表达式。
如所写,该公式返回一个垂直数组。如果您有带有动态数组的 O365,它将溢出结果。如果不这样做,您可以通过将公式作为数组公式输入多个单元格或使用索引函数来检索它们

Option Explicit
Function foo(s) As String()
Dim RE As RegExp, MC As MatchCollection, M As Match
Const sPat As String = "(?:\d+\.\/)|(?:\d+\.[\s\S]+?(?=(?:\x0A+\d+\.)|$))"
Dim sTemp() As String, I As Long

Set RE = New RegExp
With RE
.Global = True
.MultiLine = False
.Pattern = sPat
If .Test(s) = True Then
Set MC = .Execute(s)
ReDim sTemp(1 To MC.Count, 1 To 1) '2D array for vertical results
I = 0
For Each M In MC
I = I + 1
sTemp(I, 1) = M
Next M
End If

For I = UBound(sTemp, 1) - 1 To LBound(sTemp, 1) Step -1
If Right(sTemp(I, 1), 1) = "/" Then
sTemp(I, 1) = Replace(sTemp(I, 1), "/", "") & Mid(sTemp(I + 1, 1), InStr(sTemp(I + 1, 1), ".") + 1, 999)
End If
Next I

foo = sTemp

End With

End Function

enter image description here
正则表达式解释
提取线
(?:\d+\.\/)|(?:\d+\.[\s\S]+?(?=(?:\x0A+\d+\.)|$))
选项:^$ 在换行符处不匹配
  • Match this alternative (?:\d+\.\/)
  • Match the regular expression below (?:\d+\.\/)
  • Match a single character that is a “digit” \d+
  • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +

  • Match the character “.” literally \.
  • Match the character “/” literally \/


  • Or match this alternative (?:\d+\.[\s\S]+?(?=(?:\x0A+\d+\.)|$))
  • Match the regular expression below (?:\d+\.[\s\S]+?(?=(?:\x0A+\d+\.)|$))
  • Match a single character that is a “digit” \d+
  • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +

  • Match the character “.” literally \.
  • Match a single character present in the list below [\s\S]+?
  • Between one and unlimited times, as few times as possible, expanding as needed (lazy) +?
  • A “whitespace character” \s
  • Any character that is NOT a “whitespace character” \S

  • Assert that the regex below can be matched starting at this position (positive lookahead) (?=(?:\x0A+\d+\.)|$)
  • Match this alternative (?:\x0A+\d+\.)
  • Match the regular expression below (?:\x0A+\d+\.)
  • Match the line feed character \x0A+
  • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +

  • Match a single character that is a “digit” \d+
  • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +

  • Match the character “.” literally \.


  • Or match this alternative $
  • Assert position at the very end of the string $





  • 创建于 RegexBuddy

    关于regex - 是否可以改进此正则表达式以在 Excel 单元格中查找编号的文本行以避免错误匹配?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61872398/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com