gpt4 book ai didi

python - 如何从段落中删除单词列表?

转载 作者:太空宇宙 更新时间:2023-11-03 13:06:31 26 4
gpt4 key购买 nike

我想从段落中删除一个单词列表。所以我创建了我想要删除的列表

fitlerWords= ['Cage','Contract','Number','Quantity','Unit','Cost','AWD','Date','CONTINUED',
'SECTION', 'Procurement','history','For','on','Next','Page','Continuation','Sheet',
'Reference','of','Document','Being','CONTINUED','pages','SECTION']

如果上面的词存在,我想从这句话中删除

015536159/6630 CAGE Contract Number Quantity Unit Cost AWD Date 32YK1 SPE2DH19P0522 22.000 1394.13000 20190102 32YK1 SPE2DH18P1630 21.000 1356.41000 20180604 74YZ3 SPE2DH18P1184 15.000 1282.50000 20180314 32YK1 SPE2DH17V1630 16.000 1335.91000 20170214 58837 SPE2DH16V2501 17.000 1369.00000 20160601 32YK1 SPE2DH16M0463 13.000 1358.20000 20151125 CONTINUED ON NEXT PAGE<br/>
CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: SPE2DH-19-T-6601 PAGE 4 OF 22 PAGES SECTION A Procurement History for NSN/FSC:015536159/6630 CAGE Contract Number Quantity Unit Cost AWD Date 32YK1 S$ DH16M0068 32YK1 SPE2DH14V3122 32YK1 S$ DH14V2252 32YK1 SPE2DH14V0165 58837 SPM2DH13V1222 08576 SPM2DH13M0509 58837 SPM2DH12V0342 08576 SPM2DH12M0490 08576 SPM2DH11V1261 3BSP4 SPM2DSO8MA800 3BSP4 SPM2DS08M6542 3BSP4 SPM2DS08M5128 3BSP4 SPM2DS08M5127 3BSP4 SPM2DS08M5125 18.000 1462.05000 20151005 12.000 1246.39000 20140918 9.000 1246.39000 20140711 10.000 1246.39000 20131223 12.000 1258.00000 20130724 15.000 1100.09000 20121205 27.000 1200.00000 20111223 34.000 1057.77000 20111202 3.000 1057.77000 20110727 2.000 947.16000 20080721 100.000 947.16000 20080323 2.000 947.16000 20080227 2.000 947.16000 20080227 2.000 947.16000 20080225 CONTINUED ON NEXT PAGE<br/>
CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: SPE2DH-19-T-6601 PAGE 5 OF 22 PAGES SECTION B

所以我用了这段代码

for x in fitlerWords:
try:
filteredHistory = history.replace(x,"")
except Exception as e:
print(e, x)

print(filteredHistory)

当我打印时,我得到了段落。否已被删除。我做错了什么?如果这些词存在,我如何从段落中过滤掉所有这些词?

最佳答案

re.sub 与包含所有关键字的替代项一起使用:

fitlerWords = ['Cage','Contract','Number','Quantity','Unit','Cost','AWD','Date','CONTINUED', 'SECTION', 'Procurement','history','For','on','Next','Page','Continuation','Sheet','Reference','of','Document','Being','CONTINUED','pages','SECTION']
regex = r'\b(?:' + '|'.join(filterWords) + r')\b'
filteredHistory = re.sub(regex, '', history, flags=re.IGNORECASE)
print(filteredHistory)

注意:根据您对替换历史文本的审美,您可能还希望删除每个关键字一侧周围的空白,比如右侧。在这种情况下,我们可以尝试:

regex = r'\b(?:' + '|'.join(filterWords) + r')\s*\b'
filteredHistory = re.sub(regex, '', history, flags=re.IGNORECASE)

这里的正则表达式逻辑构建了一个看起来像这样的模式:

\b(?:Cage|Contract|Number|Quantity)\b

它当然会有更多关键字,但这是一般模式。我们使用 re.sub 来匹配这个模式,然后替换为空字符串,以有效地删除所有匹配的关键字。无论关键字大小写如何,re.IGNORECASE 标志都会替换此正则表达式。

关于python - 如何从段落中删除单词列表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58152499/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com