gpt4 book ai didi

python - 如何从列表列表中过滤特定的 POS 标签到单独的列表?

转载 作者:行者123 更新时间:2023-12-01 02:58:52 25 4
gpt4 key购买 nike

我有大量的产品描述数据,需要将产品名称和意图与描述分开,我发现在用 POS 标签标记文本后分离 NNP 标签对于进一步清理有一定帮助。

我有以下类似的数据,我只想过滤 NNP 标签,并希望它们在各自的列表中过滤,但无法这样做。

 data = [[('User', 'NNP'),
('is', 'VBZ'),
('not', 'RB'),
('able', 'JJ'),
('to', 'TO'),
('order', 'NN'),
('products', 'NNS'),
('from', 'IN'),
('iShopCatalog', 'NN'),
('Coala', 'NNP'),
('excluding', 'VBG'),
('articles', 'NNS'),
('from', 'IN'),
('VWR', 'NNP')],
[('Arfter', 'NNP'),
('transferring', 'VBG'),
('the', 'DT'),
('articles', 'NNS'),
('from', 'IN'),
('COALA', 'NNP'),
('to', 'TO'),
('SRM', 'VB'),
('the', 'DT'),
('Category', 'NNP'),
('S9901', 'NNP'),
('Dummy', 'NNP'),
('is', 'VBZ'),
('maintained', 'VBN')],
[('Due', 'JJ'),
('to', 'TO'),
('this', 'DT'),
('the', 'DT'),
('user', 'NN'),
('is', 'VBZ'),
('not', 'RB'),
('able', 'JJ'),
('to', 'TO'),
('order', 'NN'),
('the', 'DT'),
('product', 'NN')],
[('All', 'DT'),
('other', 'JJ'),
('users', 'NNS'),
('can', 'MD'),
('order', 'NN'),
('these', 'DT'),
('articles', 'NNS')],
[('She', 'PRP'),
('can', 'MD'),
('order', 'NN'),
('other', 'JJ'),
('products', 'NNS'),
('from', 'IN'),
('a', 'DT'),
('POETcatalog', 'NNP'),
('without', 'IN'),
('any', 'DT'),
('problems', 'NNS')],
[('Furtheremore', 'IN'),
('she', 'PRP'),
('is', 'VBZ'),
('able', 'JJ'),
('to', 'TO'),
('order', 'NN'),
('products', 'NNS'),
('from', 'IN'),
('the', 'DT'),
('Vendor', 'NNP'),
('VWR', 'NNP'),
('through', 'IN'),
('COALA', 'NNP')],
[('But', 'CC'),
('articles', 'NNS'),
('from', 'IN'),
('all', 'DT'),
('other', 'JJ'),
('suppliers', 'NNS'),
('are', 'VBP'),
('not', 'RB'),
('orderable', 'JJ')],
[('I', 'PRP'),
('already', 'RB'),
('spoke', 'VBD'),
('to', 'TO'),
('anic', 'VB'),
('who', 'WP'),
('maintain', 'VBP'),
('the', 'DT'),
('catalog', 'NN'),
('COALA', 'NNP'),
('and', 'CC'),
('they', 'PRP'),
('said', 'VBD'),
('that', 'IN'),
('the', 'DT'),
('reason', 'NN'),
('should', 'MD'),
('be', 'VB'),
('the', 'DT'),
('assignment', 'NN'),
('of', 'IN'),
('the', 'DT'),
('plant', 'NN')],
[('User', 'NNP'),
('is', 'VBZ'),
('a', 'DT'),
('assinged', 'JJ'),
('to', 'TO'),
('Universitaet', 'NNP'),
('Regensburg', 'NNP'),
('in', 'IN'),
('Scout', 'NNP'),
('but', 'CC'),
('in', 'IN'),
('P17', 'NNP'),
('table', 'NN'),
('YESRMCDMUSER01', 'NNP'),
('she', 'PRP'),
('is', 'VBZ'),
('assigned', 'VBN'),
('to', 'TO'),
('company', 'NN'),
('001500', 'CD'),
('Merck', 'NNP'),
('KGaA', 'NNP')],
[('Please', 'NNP'),
('find', 'VB'),
('attached', 'JJ'),
('some', 'DT'),
('screenshots', 'NNS')]]

我编写了以下代码:

def prodname(a):
p = []
for i in a:
for j in range(len(i)):
if i[j][1]=='NNP':
p.append(i[j][0])
return p

给出以下输出:

    ['User',
'Coala',
'VWR',
'Arfter',
'COALA',
'Category',
'S9901',
'Dummy',
'POETcatalog',
'Vendor',
'VWR',
'COALA',
'COALA',
'User',
'Universitaet',
'Regensburg',
'Scout',
'P17',
'YESRMCDMUSER01',
'Merck',
'KGaA',
'Please']

我想要得到的输出是:

[['User',
'Coala',
'VWR']
['Arfter',
'COALA',
'Category',
'S9901',
'Dummy']
[],
[],
['POETcatalog'],
['Vendor',
'VWR',
'COALA'],
[],
['COALA'],
['User',
'Universitaet',
'Regensburg',
'Scout',
'P17',
'YESRMCDMUSER01',
'Merck',
'KGaA'],
['Please']]

还尝试使用 [[] for i in range(len(data)] 附加到各自的列表,但无法这样做。

最佳答案

您可以使用此列表理解:

[[j[0] for j in i if j[-1]=="NNP"] for i in data]

输出:

[['User', 'Coala', 'VWR'], ['Arfter', 'COALA', 'Category', 'S9901', 'Dummy'], [], [], ['POETcatalog'], ['Vendor', 'VWR', 'COALA'], [], ['COALA'], ['User', 'Universitaet', 'Regensburg', 'Scout', 'P17', 'YESRMCDMUSER01', 'Merck', 'KGaA'], ['Please']]

关于python - 如何从列表列表中过滤特定的 POS 标签到单独的列表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44004104/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com