gpt4 book ai didi

regex - 单词中的撇号不被识别为字符串替换

转载 作者:IT王子 更新时间:2023-10-29 01:52:12 25 4
gpt4 key购买 nike

我在用正则表达式替换单词 “you're” 时遇到问题。

所有其他单词都正确地改变了单词“you're”。我认为它不是在撇号之后解析。

我必须将 “你” 替换为 “我” 并将 “你是” 替换为 “我”米”。它会把“你”变成“我”,但是“你是”变成了“我是”因为它没有超过撇号,并且出于某种原因它认为这是单词的结尾。我必须以某种方式避开撇号。

请参阅下面的问题代码。

package main

import (
"fmt"
"math/rand"
"regexp"
"strings"
"time"
)

//Function ElizaResponse to take in and return a string
func ElizaResponse(str string) string {

// replace := "How do you know you are"

/*Regex MatchString function with isolation of the word "father"
*with a boundry ignore case regex command.
*/
if matched, _ := regexp.MatchString(`(?i)\bfather\b`, str);
//Condition to replace the original string if it has the word "father"
matched {
return "Why don’t you tell me more about your father?"
}
r1 := regexp.MustCompile(`(?i)\bI'?\s*a?m\b`)

//Match the words "I am" and capture for replacement
matched := r1.MatchString(str)

//condition if "I am" is matched
if matched {

capturedString := r1.ReplaceAllString(str, "$1")
boundaries := regexp.MustCompile(`\b`)
tokens := boundaries.Split(capturedString, -1)

// List the reflections.
reflections := [][]string{
{`I`, `you`},
{`you're`, `I'm`},
{`your`, `my`},
{`me`, `you`},
{`you`, `I`},
{`my`, `your`},
}

// Loop through each token, reflecting it if there's a match.
for i, token := range tokens {
for _, reflection := range reflections {
if matched, _ := regexp.MatchString(reflection[0], token); matched {
tokens[i] = reflection[1]
break
}
}
}

// Put the tokens back together.
return strings.Join(tokens, ``)

}

//Get random number from the length of the array of random struct
//an array of strings for the random response
response := []string{"I’m not sure what you’re trying to say. Could you explain it to me?",
"How does that make you feel?",
"Why do you say that?"}
//Return a random index of the array
return response[rand.Intn(len(response))]

}

func main() {
rand.Seed(time.Now().UTC().UnixNano())

fmt.Println("Im supposed to just take what you're saying at face value?")
fmt.Println(ElizaResponse("Im supposed to just take what you're saying at face value?"))


}

最佳答案

请注意,撇号字符会创建单词边界,因此您在正则表达式中使用 \b 可能会让您感到困惑。也就是说,字符串 "I'm" 有四个单词边界,每个字符前后各一个。

┏━┳━┳━┓
┃I┃'┃m┃
┗━┻━┻━┛
│ │ │ └─ end of line creates a word boundary
│ │ └─── after punctuation character creates a word boundary
│ └───── before punctuation character creates a word boundary
└─────── start of line creates a word boundary

没有办法改变单词边界元字符的行为,因此您最好将包含带标点符号的完整单词的正则表达式映射到所需的替换,例如:

type Replacement struct {
rgx *regexp.Regexp
rpl string
}

replacements := []Replacement{
{regexp.MustCompile("\\bI\\b"), "you"},
{regexp.MustCompile("\\byou're\\b"), "I'm"},
// etc...
}

另请注意,您的示例之一包含 UTF-8“右单引号”(U+2019、0xe28099),不要与 UTF-8/ASCII 撇号(U+0027、0x27)混淆!

fmt.Sprintf("% x", []byte("'’")) // => "27 e2 80 99"

关于regex - 单词中的撇号不被识别为字符串替换,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47356475/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com