gpt4 book ai didi

c# - 如何为提及和主题标签修复此正则表达式?

转载 作者:太空宇宙 更新时间:2023-11-03 22:47:10 25 4
gpt4 key购买 nike

我使用了以下 tool构建一个有效的 regex用于提及和主题标签。我已经设法在插入的文本中匹配到我想要的内容,但我需要解决以下匹配问题。

  • Only match those substrings which start and end with spaces. And in the case of a substring at the beginning or at the end of the string that is valid (be it a hashtag or a mention), also take it.

  • The matches found by the regex only take the part that does not contain spaces, (that the spaces are only part of the rule, but not part of the substring).

我使用的正则表达式如下:(([@]{1}|[#]{1})[A-Za-z0-9]+)

字符串匹配的有效性和无效性的一些例子:

"@hello friend" - @hello must be matched as a mention.
"@ hello friend" - here there should be no matches.
"hey@hello @hello" - here only the last @hello must be matched as a mention.
"@hello! hi @hello #hi ##hello" - here only the second @hello and #hi must be matched as a mention and hashtag respectively.

图像中的另一个示例,其中只有 "@word" 应该是有效的提及:

enter image description here

2018 年 3 月 15 日 16:35 (GMT-4) 更新

我找到了解决问题的方法,使用 tool在 PCRE 模式下(服务器)并使用 negative lookbehindnegative lookahead:

(?<![^\s])(([@]{1}|[#]{1})[A-Za-z0-9]+)(?![^\s])

这是比赛:

enter image description here

但现在疑问来了,它与C#中的正则表达式一起工作吗?negative lookaheadnegative lookbehind,因为例如在 Javascript 中它不会工作,正如在工具中看到的那样,它用红线标记我。

最佳答案

试试这个模式:

(?:^|\s+)(?:(?<mention>@)|(?<hash>#))(?<item>\w+)(?=\s+)

这里分解一下:

  • (?:创建一个非捕获组
  • ^|\s+匹配字符串或空格的开头
  • (?:创建一个非捕获组
  • (?<mention>@|(?<hash>#)创建一个组来匹配 @#并分别命名组mention和hash
  • (?<item>\w+)与任何字母数字字符匹配一次或多次,并帮助从组中提取项目以便于使用。
  • (?=\s+)创建一个积极的前景来匹配任何空白

fiddle :Live Demo

然后您需要使用底层语言来修剪返回的匹配项以删除任何前导/尾随空格。

更新既然你提到你在使用 C#,我想我会为你提供一个 .NET 解决方案来解决你的问题,而不需要 RegEx;虽然我没有测试结果,但我猜这也比使用 RegEx 更快。

就个人而言,我的 .NET 风格是 Visual Basic,所以我为您提供了一个 VB.NET 解决方案,但您可以通过转换器轻松地运行它,因为我从不使用任何不能在C#:

Private Function FindTags(ByVal lead As Char, ByVal source As String) As String()
Dim matches As List(Of String) = New List(Of String)
Dim current_index As Integer = 0

'Loop through all but the last character in the source
For index As Integer = 0 To source.Length - 2
'Reset the current index
current_index = index

'Check if the current character is a "@" or "#" and either we're starting at the beginning of the String or the last character was whitespace and then if the next character is a letter, digit, or end of the String
If source(index) = lead AndAlso (index = 0 OrElse Char.IsWhiteSpace(source, index - 1)) AndAlso (Char.IsLetterOrDigit(source, index + 1) OrElse index + 1 = source.Length - 1) Then
'Loop until the next character is no longer a letter or digit
Do
current_index += 1
Loop While current_index + 1 < source.Length AndAlso Char.IsLetterOrDigit(source, current_index + 1)

'Check if we're at the end of the line or the next character is whitespace
If current_index = source.Length - 1 OrElse Char.IsWhiteSpace(source, current_index + 1) Then
'Add the match to the collection
matches.Add(source.Substring(index, current_index + 1 - index))
End If
End If
Next

Return matches.ToArray()
End Function

fiddle :Live Demo

关于c# - 如何为提及和主题标签修复此正则表达式?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49308174/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com