gpt4 book ai didi

python - 如何根据词对的存在来选择子串? Python

转载 作者:太空宇宙 更新时间:2023-11-04 02:27:11 25 4
gpt4 key购买 nike

我有大量的句子,我想从中提取以某些单词组合开头的子句子。例如,我想提取以“what does”或“what is”等开头的句子片段(实质上是从句子中删除出现在单词对之前的单词)。句子和单词对都是存储在 DataFrame 中:

'Sentence'                                    'First2'                                    
0 If this is a string what does it say? 0 can I
1 And this is a string, should it say more? 1 should it
2 This is yet another string. 2 what does
3 etc. etc. 3 etc. etc

我想从上面的例子中得到的结果是:

0 what does it say?
1 should it say more?
2

下面最明显的解决方案(至少对我而言)不起作用。它只使用第一个单词对 b 遍历所有句子 r,而不使用其他 b

a = df['Sentence']
b = df['First2']

#The function seems to loop over all r's but only over the first b:
def func(z):
for x in b:
if x in r:
s = z[z.index(x):]
return s
else:
return ‘’

df['Segments'] = a.apply(func)

似乎以这种方式同时循环两个 DataFrame 是行不通的。有没有更有效的方法来做到这一点?

最佳答案

我认为您的代码中存在错误。

else:
return ''

这意味着如果第一次比较不匹配,'func' 将立即返回。这可能就是代码不返回任何匹配项的原因。

示例工作代码如下:

# The function seems to loop over all r's but only over the first b:
def func(sentence, first_twos=b):
for first_two in first_twos:
if first_two in sentence:
s = sentence[sentence.index(first_two):]
return s
return ''

df['Segments'] = a.apply(func)

输出:

df:   
{
'First2': ['can I', 'should it', 'what does'],
'Segments': ['what does it say? ', 'should it say more?', ''],
'Sentence': ['If this is a string what does it say? ', 'And this is a string, should it say more?', 'This is yet another string. ' ]
}

关于python - 如何根据词对的存在来选择子串? Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50106176/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com