gpt4 book ai didi

python - 图形连接句子

转载 作者:行者123 更新时间:2023-12-03 16:18:20 26 4
gpt4 key购买 nike

我有一些主题的句子列表(两个),如下所示:

Sentences
Trump says that it is useful to win the next presidential election.
The Prime Minister suggests the name of the winner of the next presidential election.
In yesterday's conference, the Prime Minister said that it is very important to win the next presidential election.
The Chinese Minister is in London to discuss about climate change.
The president Donald Trump states that he wants to win the presidential election. This will require a strong media engagement.
The president Donald Trump states that he wants to win the presidential election. The UK has proposed collaboration.
The president Donald Trump states that he wants to win the presidential election. He has the support of his electors.
如您所见,句子中有相似之处。
enter image description here
我试图通过使用图形(定向)来关联多个句子并形象化它们的特征。通过应用句子的行排序,从相似性矩阵构建图形,如上所示。
我创建了一个新列,时间,以显示句子的顺序,因此第一行(特朗普说……)在时间1处。第二排(总理建议...)在时间2,依此类推。
像这样
Time    Sentences
1 Trump said that it is useful to win the next presidential election.
2 The Prime Minister suggests the name of the winner of the next presidential election.

3 In today's conference, the Prime Minister said that it is very important to win the next presidential election.

...
然后,我想找到这些关系,以便对该主题有一个清晰的了解。
句子的多个路径将表明存在与之相关的多个信息。
为了确定两个句子之间的相似性,我尝试如下提取名词和动词:
noun=[]
verb=[]
for index, row in df.iterrows():
nouns.append([word for word,pos in pos_tag(row[0]) if pos == 'NN'])
verb.append([word for word,pos in pos_tag(row[0]) if pos == 'VB'])
因为它们是任何句子中的关键字。
因此,当关键字(名词或动词)出现在句子x中而不出现在其他句子中时,则表示这两个句子之间存在差异。
我认为,更好的方法可能是使用word2vec或gensim(WMD)。
必须为每个句子计算相似度。
我想建立一个图表,显示上面示例中句子的内容。
既然有两个主题(特朗普和中国部长),对于每个主题我都需要寻找子主题。例如,特朗普举行了副主题总统选举。我图中的一个节点应该代表一个句子。每个节点中的单词代表句子的差异,并在句子中显示新信息。例如,在时间5的句子中的单词 states在时间6和7的相邻句子中。
我只想找到一种产生类似结果的方法,如下图所示。我尝试主要使用名词和动词提取,但可能不是正确的方法。
我试图做的是考虑在时间1处的句子,并将其与其他句子进行比较,分配相似性分数(名词和动词提取,以及word2vec),然后对所有其他句子重复该句子。
但是我现在的问题是如何提取差异以创建可以理解的图形。
对于图的一部分,我将考虑使用networkx(DiGraph):
G = nx.DiGraph()
N = Network(directed=True)
显示关系的方向。
我提供了一个不同的示例来使其更清楚(但如果您使用前面的示例,也可以。不便之处,敬请原谅,但由于我的第一个问题不太清楚,因此我必须提供一个更好的示例,例如,可能更简单)。

最佳答案

没有实现用于动词/名词分离的NLP,只是添加了一个好的单词列表。
可以相对容易地使用spacy提取和标准化它们。
请注意,walk出现在1,2,5句子中,并构成一个三合会。

import re
import networkx as nx
import matplotlib.pyplot as plt

plt.style.use("ggplot")

sentences = [
"I went out for a walk or walking.",
"When I was walking, I saw a cat. ",
"The cat was injured. ",
"My mum's name is Marylin.",
"While I was walking, I met John. ",
"Nothing has happened.",
]

G = nx.Graph()
# set of possible good words
good_words = {"went", "walk", "cat", "walking"}

# remove punctuation and keep only good words inside sentences
words = list(
map(
lambda x: set(re.sub(r"[^\w\s]", "", x).lower().split()).intersection(
good_words
),
sentences,
)
)

# convert sentences to dict for furtehr labeling
sentences = {k: v for k, v in enumerate(sentences)}

# add nodes
for i, sentence in sentences.items():
G.add_node(i)

# add edges if two nodes have the same word inside
for i in range(len(words)):
for j in range(i + 1, len(words)):
for edge_label in words[i].intersection(words[j]):
G.add_edge(i, j, r=edge_label)

# compute layout coords
coord = nx.spring_layout(G)

plt.figure(figsize=(20, 14))

# set label coords a bit upper the nodes
node_label_coords = {}
for node, coords in coord.items():
node_label_coords[node] = (coords[0], coords[1] + 0.04)

# draw the network
nodes = nx.draw_networkx_nodes(G, pos=coord)
edges = nx.draw_networkx_edges(G, pos=coord)
edge_labels = nx.draw_networkx_edge_labels(G, pos=coord)
node_labels = nx.draw_networkx_labels(G, pos=node_label_coords, labels=sentences)
plt.title("Sentences network")
plt.axis("off")
enter image description here
更新
如果要测量不同句子之间的相似性,则可能需要计算句子嵌入之间的差异。
这使您有机会找到带有不同单词的句子之间的语义相似性,例如“一场足球比赛,有多个男子踢球”和“有些男子正在踢球”。几乎可以找到使用BERT的SoTA方法 here,更简单的方法是 here
由于您具有相似性度量,因此仅当相似性度量大于某个阈值时,才替换add_edge块以添加新边。生成的添加边代码将如下所示:
# add edges if two nodes have the same word inside
tresold = 0.90
for i in range(len(words)):
for j in range(i + 1, len(words)):
# suppose you have some similarity function using BERT or PCA
similarity = check_similarity(sentences[i], sentences[j])
if similarity > tresold:
G.add_edge(i, j, r=similarity)

关于python - 图形连接句子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63514464/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com