gpt4 book ai didi

python - 在段落中查找字典值,如果段落没有字典值,则返回 NA

转载 作者:行者123 更新时间:2023-12-01 00:44:14 25 4
gpt4 key购买 nike

假设我有段落的随机单词作为列表:

t = ['protein and carbohydrates Its is a little heavier pulsus widely used and is a versatile ingredient',
'Tea contains the goodness of Natural Ingredients Cardamom Ginger Tea bags Disclaimers As per Ayurvedic texts',
'almonds are all natural supreme sized nuts they are highly nutritious and extremely healthy',
'Camel milk can be consumed by lactose intolerant people and those allergic to cows milk',
'Healthy Crunch Almond with honey is an extra crunchy breakfast cereal for a delightful start to your mornings']

字典为

d = {'First': ['Tea','Coffee'],
'Second': ['Noodles','Pasta'],
'Third': ['sandwich','honey'],
'Fourth': ['Almond','apricot','blueberry']
}

我编写的代码非常慢,而且我想为与任何文本都不匹配的段落显示“NA”

代码

get_labels = []
get_text = []

for txt in t:
for dictrow in d.values():
for i in dictrow:
for j in txt.split():
if i == j:
print(j)
print(txt)
get_labels.append(j)
get_text.append(txt)


pd.DataFrame(list(zip(get_text,get_labels)),columns=["whole_text","matched_text"])

最后创建数据框输出后是:

     whole_text                                       matched_text
0 Tea contains the goodness of Natural Ingredie... Tea
1 Tea contains the goodness of Natural Ingredie... Tea
2 Healthy Crunch Almond with honey is an extra ... honey
3 Healthy Crunch Almond with honey is an extra ... Almond

但我想要的输出是:

     whole_text                                       matched_text
0 protein and carbohydrates Its is a little .... NA
1 Tea contains the goodness of Natural Ingredie... Tea
2 Tea contains the goodness of Natural Ingredie... Tea
3 almonds are all natural supreme sized nuts th... NA
4 Camel milk can be consumed by lactose intoler... NA
2 Healthy Crunch Almond with honey is an extra ... honey
3 Healthy Crunch Almond with honey is an extra ... Almond

我有 2 个问题:
a) 我想为与上表等任何文本字典值都不匹配的段落添加“NA”。
b)我如何优化此代码以使其运行速度更快,因为我在大型数据集上使用它

最佳答案

设置交集功率:

paragraphs = ['protein and carbohydrates Its is a little heavier pulsus widely used and is a versatile ingredient',
'Tea contains the goodness of Natural Ingredients Cardamom Ginger Tea bags Disclaimers As per Ayurvedic texts',
'almonds are all natural supreme sized nuts they are highly nutritious and extremely healthy',
'Camel milk can be consumed by lactose intolerant people and those allergic to cows milk',
'Healthy Crunch Almond with honey is an extra crunchy breakfast cereal for a delightful start to your mornings']

d = {'First': ['Tea', 'Coffee'],
'Second': ['Noodles', 'Pasta'],
'Third': ['sandwich', 'honey'],
'Fourth': ['Almond', 'apricot','blueberry']
}

words = set(w for lst in d.values() for w in lst)
match_stats = {'whole_text': [], 'matched_text': []}
for p in paragraphs:
common_words = set(p.split()) & words
if not common_words:
match_stats['whole_text'].append(p)
match_stats['matched_text'].append('NA')
else:
for w in common_words:
match_stats['whole_text'].append(p)
match_stats['matched_text'].append(w)

df = pd.DataFrame(match_stats)
print(df)

输出:

                                          whole_text matched_text
0 protein and carbohydrates Its is a little heav... NA
1 Tea contains the goodness of Natural Ingredie... Tea
2 almonds are all natural supreme sized nuts the... NA
3 Camel milk can be consumed by lactose intolera... NA
4 Healthy Crunch Almond with honey is an extra ... honey
5 Healthy Crunch Almond with honey is an extra ... Almond

关于python - 在段落中查找字典值,如果段落没有字典值,则返回 NA,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57111483/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com