python - Python正则表达式删除空格并在空格所在的位置大写字母？-6ren

python - Python正则表达式删除空格并在空格所在的位置大写字母？

转载作者：行者123 更新时间：2023-12-01 05:56:51

我想从用户提供的单个输入框中创建标签列表，并用逗号分隔，我正在寻找一些表达式来帮助实现此目的。

我想要提供的输入字段和：

删除所有double +空格，制表符，换行（仅保留单个空格）
删除所有（双引号和双引号），逗号除外，逗号只能是
在每个逗号之间，我想要类似“标题大小写”的内容，但不包括第一个单词，根本不包含单个单词，以便在删除最后一个空格时，标记显示为“ somethingLikeTitleCase”或“ something”或“ twoWords” '
最后，删除所有剩余的空间

到目前为止，这是我到目前为止收集的内容：

def no_whitespace(s):
"""Remove all whitespace & newlines. """
    return re.sub(r"(?m)\s+", "", s)


# remove spaces, newlines, all whitespace
# http://stackoverflow.com/a/42597/523051

  tag_list = ''.join(no_whitespace(tags_input))

# split into a list at comma's

  tag_list = tag_list.split(',')

# remove any empty strings (since I currently don't know how to remove double comma's)
# http://stackoverflow.com/questions/3845423/remove-empty-strings-from-a-list-of-strings

  tag_list = filter(None, tag_list)

尽管要修改该正则表达式以删除除逗号以外的所有标点符号，我还是迷失了，我什至不知道从哪里开始大写。

有什么想法让我朝正确的方向前进吗？

如建议的那样，这是一些示例输入= required_outputs

形式：“ tHiS is a tAg，'whitespace'！＆＃^，secondcomment，no！punc $$，ifNOSPACESthenPRESERVEcaps”应显示为
['thisIsATag'，'secondcomment'，'noPunc'，'ifNOSPACESthenPRESERVEcaps']

最佳答案

这是解决问题的一种方法（尽管可以使用一个正则表达式，但它没有使用任何正则表达式）。我们将问题分为两个函数：一个函数将字符串拆分为逗号分隔并处理每个片断（parseTags），另一个函数则将字符串处理并将其处理为有效标记（sanitizeTag）。带注释的代码如下：

# This function takes a string with commas separating raw user input, and
# returns a list of valid tags made by sanitizing the strings between the
# commas.
def parseTags(str):
    # First, we split the string on commas.
    rawTags = str.split(',')

    # Then, we sanitize each of the tags.  If sanitizing gives us back None,
    # then the tag was invalid, so we leave those cases out of our final
    # list of tags.  We can use None as the predicate because sanitizeTag
    # will never return '', which is the only falsy string.
    return filter(None, map(sanitizeTag, rawTags))

# This function takes a single proto-tag---the string in between the commas
# that will be turned into a valid tag---and sanitizes it.  It either
# returns an alphanumeric string (if the argument can be made into a valid
# tag) or None (if the argument cannot be made into a valid tag; i.e., if
# the argument contains only whitespace and/or punctuation).
def sanitizeTag(str):
    # First, we turn non-alphanumeric characters into whitespace.  You could
    # also use a regular expression here; see below.
    str = ''.join(c if c.isalnum() else ' ' for c in str)

    # Next, we split the string on spaces, ignoring leading and trailing
    # whitespace.
    words = str.split()

    # There are now three possibilities: there are no words, there was one
    # word, or there were multiple words.
    numWords = len(words)
    if numWords == 0:
        # If there were no words, the string contained only spaces (and/or
        # punctuation).  This can't be made into a valid tag, so we return
        # None.
        return None
    elif numWords == 1:
        # If there was only one word, that word is the tag, no
        # post-processing required.
        return words[0]
    else:
        # Finally, if there were multiple words, we camel-case the string:
        # we lowercase the first word, capitalize the first letter of all
        # the other words and lowercase the rest, and finally stick all
        # these words together without spaces.
        return words[0].lower() + ''.join(w.capitalize() for w in words[1:])

确实，如果运行此代码，我们将得到：

>>> parseTags("tHiS iS a tAg, \t\n!&#^ , secondcomment , no!punc$$, ifNOSPACESthenPRESERVEcaps")
['thisIsATag', 'secondcomment', 'noPunc', 'ifNOSPACESthenPRESERVEcaps']

这段代码中有两点值得澄清。首先是在 str.split()中使用 sanitizeTags。这会将 a b c变成 ['a','b','c']，而 str.split(' ')将产生 ['','a','b','c','']。这几乎肯定是您想要的行为，但是有一个极端的情况。考虑字符串 tAG$。 $变成一个空格，并被拆分剥离；因此，它变成了 tAG而不是 tag。这可能是您想要的，但如果不是，则必须小心。我要做的是将该行更改为 words = re.split(r'\s+', str)，它将在空白处分割字符串，但保留前导和尾随的空字符串；但是，我也将 parseTags更改为使用 rawTags = re.split(r'\s*,\s*', str)。您必须同时进行这两项更改。 'a , b , c'.split(',') becomes ['a ', ' b ', ' c']，这不是您想要的行为，而 r'\s*,\s*'也会删除逗号周围的空格。如果您忽略开头和结尾的空格，则区别并不重要；但是如果不这样做，则需要小心。

最后，不使用正则表达式，而是使用 str = ''.join(c if c.isalnum() else ' ' for c in str)。您可以根据需要将其替换为正则表达式。（编辑：我在这里消除了一些有关Unicode和正则表达式的错误。）忽略Unicode，您可以将这一行替换为

str = re.sub(r'[^A-Za-z0-9]', ' ', str)

它使用 [^...]匹配除列出的字符外的所有字符：ASCII字母和数字。但是，支持Unicode更好，而且也很容易。最简单的方法是

str = re.sub(r'\W', ' ', str, flags=re.UNICODE)

在这里， \W匹配非单词字符；单词字符是字母，数字或下划线。指定 flags=re.UNICODE（在python 2.7之前不可用；可以将 r'(?u)\W'用于早期版本和2.7），字母和数字都是任何适当的Unicode字符；没有它，它们只是ASCII。如果您不想使用下划线，则可以将 |_添加到正则表达式中以匹配下划线，也可以将它们替换为空格：

str = re.sub(r'\W|_', ' ', str, flags=re.UNICODE)

我相信，这最后一个完全符合我不使用正则表达式的代码的行为。

另外，这就是我如何在没有这些注释的情况下编写相同的代码；这也使我消除了一些临时变量。您可能更喜欢带有变量的代码。这只是一个品味问题。

def parseTags(str):
    return filter(None, map(sanitizeTag, str.split(',')))

def sanitizeTag(str):
    words    = ''.join(c if c.isalnum() else ' ' for c in str).split()
    numWords = len(words)
    if numWords == 0:
        return None
    elif numWords == 1:
        return words[0]
    else:
        return words[0].lower() + ''.join(w.capitalize() for w in words[1:])

要处理新希望的行为，我们需要做两件事。首先，我们需要一种方法来固定第一个单词的大写：如果第一个字母的小写字母，则将整个单词都小写；如果第一个字母的大写字母，则将除了第一个字母之外的所有单词都小写。这很容易：我们可以直接进行检查。其次，我们希望将标点符号视为完全不可见：不应将以下单词大写。同样，这很容易-我什至讨论了如何处理上述类似问题。我们只是过滤掉所有非字母数字，非空格字符，而不是将它们转换为空格。整合这些变化使我们

def parseTags(str):
    return filter(None, map(sanitizeTag, str.split(',')))

def sanitizeTag(str):
    words    = filter(lambda c: c.isalnum() or c.isspace(), str).split()
    numWords = len(words)
    if numWords == 0:
        return None
    elif numWords == 1:
        return words[0]
    else:
        words0 = words[0].lower() if words[0][0].islower() else words[0].capitalize()
        return words0 + ''.join(w.capitalize() for w in words[1:])

运行此代码将为我们提供以下输出

>>> parseTags("tHiS iS a tAg, AnD tHIs, \t\n!&#^ , se@%condcomment$ , No!pUnc$$, ifNOSPACESthenPRESERVEcaps")
['thisIsATag', 'AndThis', 'secondcomment', 'NopUnc', 'ifNOSPACESthenPRESERVEcaps']

关于python - Python正则表达式删除空格并在空格所在的位置大写字母？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/12081704/

文章推荐： python - 防止用户多次提交喜欢或不喜欢的信息

文章推荐： tsql - 触发触发器时，如何在SQL事件探查器中查看变量值？

文章推荐： java - 亚马逊 EC2 和 jbossws

文章推荐： visual-studio-code - 如何在 VS Code 中禁用 Gutter Indicators？

c++ - 大写字母
我遇到了一个小问题。我想利用字符串中的双字母。我设法编译了一个程序，但没有成功。 #include #include #include std::string::iterator functio
javascript - 如何使文本字段中的所有内容都是大写字母/大写字母？
我想让我在文本字段中写的所有内容都是大写字母。在我写作时，而不是在失去焦点之后。我如何使用 jQuery 做到这一点？最佳答案我会为此使用 CSS。只需将 text-transform: up
asp实现生成由数字，大写字母，小写字母指定位数的随机数
<% '****************************** '函数：gen_key(digits)&nb
javascript - 使用jquery限制用户在文本框中插入空格、大写字母、第一个值作为int
我有一个表单，我希望用户只输入字母、数字我想限制他们使用数字作为第一个值例如。 1abc 使用大写字母1ABc 使用空格1 ab CD d5 我只想要abc1 OR a1bc OR f25fhg
c - 如何用C生成随机字母(大写字母)并保存到文本文件中？
很难说出这里问的是什么。这个问题是含糊的、模糊的、不完整的、过于宽泛的或修辞性的，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开它，visit the help center 。已关
mysql - 恢复数据库后保持MySQL数据库 View 大写字母
恢复MySQL数据库后，我可以保留所有大写字母名称的表。但 View 名称改为小写字母。我可以更改一些设置以使 View 在恢复数据库后保留大写字母名称吗？附注我可以在恢复后再次将 View 更改
mysql - 更好地理解查询中的 MySQL 大写字母
总的来说，我是 PDO 和 MySQL 的新手。我正在从即将弃用的 MySQL 切换到 PDO，我有一些问题想更好地了解 MySQL 查询的工作原理。我目前有这个功能，我不明白表格行前的大写U.&M
Javascript正则表达式 - 匹配单词中间的大写/大写字母(在句子中)？
仅当所有大写字母位于单词中间时，我才需要匹配它们。例如，RegExr 将与字母 E 匹配。 someThings 代表字母 T。如果大写字母从单词的开头开始，则它不应该匹配。这个正则表达式几乎匹配它
java - 如何查找字符串中连续双字母(大写字母)的数量？
这个问题已经有答案了: What causes a java.lang.ArrayIndexOutOfBoundsException and how do I prevent it? (25 个回答)
c - 字符串 - 名字和姓氏按字母顺序排列，大写字母
我需要按字母顺序相应地排列名称。我设法获得了所需的输出。但是，当我用第一个大写字母键入名称时，例如:Peter，输出是不同的。 EG 输入:Peter Paul John Mary EG 输出:第一个
java - 如何检查字符串是否包含小写字母、大写字母、特殊字符和数字？
我一直在谷歌上搜索，但没有找到我的问题的答案: 如何使用正则表达式检查字符串是否至少包含以下各项: 大写字母小写字母数字特殊字符:~`!@#$%^&*()-_=+\|[{]};:'",/? 所以
Excel 数据验证数字、字母、数字(大写字母)
所以我找到了一个适用于数字然后字母的代码，我尝试修改，以便用户在单元格中输入数据需要是 ex:"52TSQ1234512345" 。我知道他们必须输入 ex: "12PQS" (数字和字母是示例，它可
JavaScript:如何从字符串中删除任何包含(或直接位于其前面)大写字母、数字或逗号的单词？
我正在尝试编写代码，以便从字符串(文本)中删除“坏”单词。如果该词后面有逗号或任何特殊符号，则该词是“坏”的。如果该单词仅包含 a 到 z(小写字母)，则该单词并不“坏”。所以，我想要达到的结果是
c++ - std::_Atomic_thread_fence(大写字母 A)是否正确？
Visual Studio 2012 c++ 文档指出 _ReadBarrier和 _WriteBarrier内在函数现在是 deprecated: The _ReadBarrier, _WriteB
php - 浏览器不显示 .JPG 文件(大写字母)
在我的站点中，我提供了一个选项来上传带有 .jpg 和 .JPG 扩展名的图像。它们都可以工作，我可以在服务器本身上看到它们。当我尝试在浏览器中查看带有 .jpg 扩展名的照片时，将它们命名为 na
ascii - 在没有 ctype.h 的情况下转换小写/大写字母
我刚刚看到这在技术上是可行的，我无法解决的唯一错误是每次测试时打印的最后一个 ASCII 字符，我也在不使用的情况下进行了测试。姓名变量，我的意思是在 ASCII 中的任何小写字母减去 32 应该
clojure - 如何判断一个字符是否是 Clojure 中的 ASCII 大写字母
我得到了这样的序列: (\$ \# \A \( \* \& \9 \8 \7 \Z \f) 我想过滤掉其中的大写 ASCII 字母，如\A 和\Z 我试图在标准库中查找，但没有运气。有谁能够帮助我？
c# - 检查字符串是否至少包含每个 : lowercase letter, 大写字母、数字和特殊字符之一
我已经搜索过 SO 和 Google，我发现的大多数示例似乎都没有按预期工作(或者没有结合所有这些元素)。我正在尝试创建一个 Regex 表达式，如果字符串包含至少字符串中的以下 anywhere
mysql - php 查询(小写/大写字母)验证
我需要从 mysql 中选择有关查询字符串的结果。让我们的字符串是:Z 和 z(大写和小写) 数据库的样子: url_id test_char 1 Z 2
windows - 大写字母 "S"在 Windows 浏览器中突然出现缩进 - 为什么？
到目前为止，我们在任何地方都没有发现关于这种非常奇怪的效果的信息。有一段时间(不知 Prop 体从什么时候开始)，大写字母 S 在 Windows 浏览器中的呈现就像它向右移动了一两个像素一样。这

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - Python正则表达式删除空格并在空格所在的位置大写字母？