gpt4 book ai didi

python - 比较两个数据帧的多行

转载 作者:太空宇宙 更新时间:2023-11-03 21:07:13 27 4
gpt4 key购买 nike

我必须将数据框 1 中存在的一个句子的所有行与数据框 2(包含所有句子的标记)进行匹配,并从数据框 2 返回匹配的行。

我尝试了 groupby 操作,但它会返回每个匹配行的匹配项。我想让 df1 中的所有 token 都匹配,并保持它们的顺序。

下面的 df 仅包含一个句子的标记。

pdt1 = pd.DataFrame({'Word':['Obesity','in','Low-','and','Middle-Income','Countries'], 
'tag':['O','O','O','O','O','O']})

print(pdt1)

Word tag
0 Obesity O
1 in O
2 Low- O
3 and O
4 Middle-Income O
5 Countries O

其他数据框包含所有句子的标记。

pdt2 = pd.DataFrame([[1, 1, 1, 'Obesity', 'O'],
[2, 1, 1, 'in', 'O'],
[3, 1, 1, 'Low-', 'O'],
[4, 1, 1, 'and', 'O'],
[5, 1, 1, 'Middle-Income', 'O'],
[6, 1, 1, 'Countries', 'O']
[7, 1, 2, 'We', 'O'],
[8, 1, 2, 'have', 'O'],
[9, 1, 2, 'reviewed', 'O'],
[10, 1, 2, 'the', 'O'],
[11, 1, 2, 'distinctive', 'O'],
[12, 1, 2, 'features', 'O'],
[13, 1, 2, 'of', 'O'],
[14, 1, 2, 'excess', 'O'],
[15, 1, 2, 'weight', 'O'],
[16, 1, 2, ',', 'O'],
[17, 1, 2, 'its', 'O'],
[18, 1, 2, 'causes', 'O'],
[19, 1, 2, ',', 'O'],
[20, 1, 2, 'and', 'O'],
[21, 1, 2, 'related', 'O'],
[22, 1, 2, 'prevention', 'O'],
[23, 1, 2, 'and', 'O'],
[24, 1, 2, 'management', 'O'],
[25, 1, 2, 'efforts', 'O']])

pdt2.columns = ['id','Doc_ID','Sent_ID','Word','tag']
print(pdt2)


id Doc_ID Sent_ID Word tag
0 1 1 1 Obesity O
1 2 1 1 in O
2 3 1 1 Low- O
3 4 1 1 and O
4 5 1 1 Middle-Income O
5 6 1 1 Countries O
6 7 1 2 We O
7 8 1 2 have O
8 9 1 2 reviewed O
9 10 1 2 the O
10 11 1 2 distinctive O
11 12 1 2 features O
12 13 1 2 of O
13 14 1 2 excess O
14 15 1 2 weight O
15 16 1 2 , O
16 17 1 2 its O
17 18 1 2 causes O
18 19 1 2 , O
19 20 1 2 and O
20 21 1 2 related O
21 22 1 2 prevention O
22 23 1 2 and O
23 24 1 2 management O
24 25 1 2 efforts O

输出看起来像

id  Doc_ID  Sent_ID           Word tag
0 1 1 1 Obesity O
1 2 1 1 in O
2 3 1 1 Low- O
3 4 1 1 and O
4 5 1 1 Middle-Income O
5 6 1 1 Countries O

最佳答案

你的意思是:

print(pdt1.pdt2[pdt2['Sent_ID'] == 1])

输出:

    id  Doc_ID  Sent_ID           Word tag
0 1 1 1 Obesity O
1 2 1 1 in O
2 3 1 1 Low- O
3 4 1 1 and O
4 5 1 1 Middle-Income O
5 6 1 1 Countries O

编辑:

print(pdt1.merge(pdt2[pdt2['Sent_ID'] == 1],on=['Word','tag']))

输出:

            Word tag  id  Doc_ID  Sent_ID
0 Obesity O 1 1 1
1 in O 2 1 1
2 Low- O 3 1 1
3 and O 4 1 1
4 Middle-Income O 5 1 1
5 Countries O 6 1 1

关于python - 比较两个数据帧的多行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55321768/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com