gpt4 book ai didi

Python:匹配某些术语的字符串

转载 作者:太空宇宙 更新时间:2023-11-04 08:11:24 25 4
gpt4 key购买 nike

我有一个推文列表,我必须从中选择包含“促销”、“折扣”或“优惠”等术语的推文。此外,我需要通过识别诸如“%”、“Rs.”、“$”等内容来查找宣传某些交易(如折扣)的推文。我对正则表达式一无所知,而且文档也无济于事。这是我的代码。这很糟糕,但请原谅

import pymongo
import re
import datetime
client = pymongo.MongoClient()
db = client .PWSocial
fourteen_days_ago = datetime.datetime.utcnow() - datetime.timedelta(days=14)
id_list = [57947109, 183093247, 89443197, 431336956]
ar1 = [" deal "," deals ", " offer "," offers " "discount", "promotion", " sale ", " inr", " rs", "%", "inr ", "rs ", " rs."]
def func(ac_id):
mylist = []
newlist = []
tweets = list(db.tweets.find({'user_id' : ac_id, 'created_at': { '$gte': fourteen_days_ago }}))
for item in tweets:
data = item.get('text')
data = data.lower()
data = data.split()
flag = 0
if set(ar1).intersection(data):
flag = 1
abc = []
for x in ar1:
for y in data:
if re.search(x,y):
abc.append(x)
flag = 1
break
if flag == 1:
mylist.append(item.get('id'))
newlist.append(abc)
print mylist
print newlist
for i in id_list:
func(i)

这段代码没有给我任何正确的结果,而且作为正则表达式的菜鸟,我不知道它有什么问题。谁能建议更好的方法来完成这项工作?感谢您的帮助。

最佳答案

我的第一个建议——学习正则表达式,它赋予你无限的文本处理能力。

但是,为了给你一些可行的解决方案(并作为进一步探索的起点)试试这个:

import re

re_offers = re.compile(r'''
\b # Word boundary
(?: # Non capturing parenthesis
deals? # Deal or deals
| # or ...
offers? # Offer or offers
|
discount
|
promotion
|
sale
|
rs\.? # rs or rs.
|
inr\d+ # INR then digits
|
\d+inr # Digits then INR
) # And group
\b # Word boundary
| # or ...
\b\d+% # Digits (1 or more) then percent
|
\$\d+\b # Dollar then digits (didn't care of thousand separator yet)
''',
re.I|re.X) # Ignore case, verbose format - for you :)

abc = re_offers.findall("e misio $1 is inr123 discount 1INR a 1% and deal")
print(abc)

关于Python:匹配某些术语的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21700591/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com