gpt4 book ai didi

python - 从列表中删除以特定表达式开头的字符串

转载 作者:太空宇宙 更新时间:2023-11-03 13:07:01 26 4
gpt4 key购买 nike

我有一个与 Twitter 话题标签相关的字符串列表。我想删除以某些前缀开头整个字符串。

例如:

testlist = ['Just caught up with #FlirtyDancing. Just so cute! Loved it. ', 'After work drinks with this one @MrLukeBenjamin no dancing tonight though @flirtydancing @AshleyBanjo #FlirtyDancing pic.twitter.com/GJpRUZxUe8', 'Only just catching up and @AshleyBanjo you are gorgeous #FlirtyDancing', 'Loved working on this. Always a pleasure getting to assist the wonderful @kendrahorsburgh on @ashleybanjogram wonderful new show !! #flirtydancing pic.twitter.com/URMjUcgmyi', 'Just watching #FlirtyDancing & \n@AshleyBanjo what an amazing way to meet someone.. It made my heart all warm & fuzzy for these people! both couples meet back up.. pic.twitter.com/iwCLRmAi5n',]

我想删除图片 URL、主题标签和@

到目前为止,我已经尝试了一些方法,即使用 startswith() 方法和 replace() 方法。

例如:

prefixes = ['pic.twitter.com', '#', '@']
bestlist = []

for line in testlist:
for word in prefixes:
line = line.replace(word,"")
bestlist.append(line)

这似乎去掉了“pic.twitter.com”,但没有去掉 URL 末尾的一系列字母和数字。这些字符串是动态的,每次都会有不同的结束 URL...这就是为什么我想删除以该前缀开头的整个字符串。

我也尝试过标记化所有内容,但 replace() 仍然无法删除整个单词:

import nltk 

for line in testlist:
tokens = nltk.tokenize.word_tokenize(line)
for token in tokens:
for word in prefixes:
if token.startswith(word):
token = token.replace(word,"")
print(token)

我开始对 startswith() 方法和 replace() 方法失去希望,觉得我可能用这两个方法找错了树。

有没有更好的方法来解决这个问题?我怎样才能达到删除所有以 #、@ 和 pic.twitter 开头的字符串的预期结果?

最佳答案

您可以使用正则表达式来指定要替换的单词类型并使用re.sub

import re

testlist = ['Just caught up with #FlirtyDancing. Just so cute! Loved it. ', 'After work drinks with this one @MrLukeBenjamin no dancing tonight though @flirtydancing @AshleyBanjo #FlirtyDancing pic.twitter.com/GJpRUZxUe8', 'Only just catching up and @AshleyBanjo you are gorgeous #FlirtyDancing', 'Loved working on this. Always a pleasure getting to assist the wonderful @kendrahorsburgh on @ashleybanjogram wonderful new show !! #flirtydancing pic.twitter.com/URMjUcgmyi', 'Just watching #FlirtyDancing & \n@AshleyBanjo what an amazing way to meet someone.. It made my heart all warm & fuzzy for these people! both couples meet back up.. pic.twitter.com/iwCLRmAi5n',]
regexp = r'pic\.twitter\.com\S+|@\S+|#\S+'

res = [re.sub(regexp, '', sent) for sent in testlist]
print(res)

输出

Just caught up with  Just so cute! Loved it. 
After work drinks with this one no dancing tonight though
Only just catching up and you are gorgeous
Loved working on this. Always a pleasure getting to assist the wonderful on wonderful new show !!
Just watching &
what an amazing way to meet someone.. It made my heart all warm & fuzzy for these people! both couples meet back up..

关于python - 从列表中删除以特定表达式开头的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55274755/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com