gpt4 book ai didi

php - 检查句子是否有相同的单词

转载 作者:可可西里 更新时间:2023-11-01 07:37:40 25 4
gpt4 key购买 nike

tb_content(左)和 tb_word(右):

=====================================    ================================
|id|sentence |sentence_id|content_id| |id|word|sentence_id|content_id|
===================================== ================================
| 1|sentence1| 0 | 1 | | 1| a | 0 | 1 |
| 2|sentence2| 1 | 1 | | 2| b | 0 | 1 |
| 3|sentence5| 0 | 2 | | 3| c | 1 | 1 |
| 4|sentence6| 1 | 2 | | 4| a | 1 | 1 |
| 5|sentence7| 2 | 2 | | 5| e | 1 | 1 |
===================================== | 6| f | 0 | 2 |
| 7| g | 1 | 2 |
| 8| h | 1 | 2 |
| 9| i | 1 | 2 |
|10| f | 2 | 2 |
|11| h | 2 | 2 |
|12| f | 2 | 2 |
================================

我需要检查每个句子是否包含每个 content_id 中其他句子所拥有的单词。

例如:

检查 content_id = 1 它们是 sentence1sentence2。从tb_word可以看出sentence1sentence2由同一个词a组成。如果两句中a的个数为>=2,则结果为a。所以如果我打印结果,它必须是:00Array ( [0] => a [1] => b) 01Array ( [3] => a ) 10Array ( [3] => a )11Array ( [0] => c [1] => a [2] => e) 其中 00 表示 sentence_id = 0sentence_id = 0

首先,我让 functionTotal 来计算每个 content_id 拥有多少 sentence :

$total = array();
$sql = mysql_query('select content_id, count(*) as RowAmount
from tb_content Group By contente_id') or die(mysql_error());
while ($row = mysql_fetch_array($sql)) {
$total[] = $row['RowAmount'];
}
return $total;

从那个函数我得到 $total 的值,并且我需要检查一些单词(来自 tb_word)在 2 的所有可能性之间的相似性句子

foreach ($total as $content_id => $totals){
for ($x=0; $x <= ($totals-1); $x++) {
for ($y=0; $y <= ($totals-1); $y++) {
$shared = getShared($x, $y);
}
}

getShared的作用是:

function getShared ($x, $y){
$token = array();
$shared = array();
$i = 0;
if ($x == $y) {
$query = mysql_query("SELECT word FROM `tb_word`
WHERE sentence_id ='$x' ");
while ($row = mysql_fetch_array($query)) {
$shared[$i] = $row['word'];
$i++;
}

} else {
$query = mysql_query("SELECT word, count(word) as jml
FROM `tb_word` WHERE sentence_id ='$x'
OR sentence_id ='$y'
GROUP BY word ");
while ($row = mysql_fetch_array($query)) {
$jml = $row['jml'];
$token[$i] = $row['word'];
if ($jml >= 2) {
$shared[$i] = $token[$i];
}
$i++;
}

但我得到的结果仍然是错误的。结果仍然在不同的 content_id 之间混合。结果也必须按 content_id 分组。对不起我糟糕的英语和糟糕的解释。 cmiiw,请帮帮我..谢谢:)

最佳答案

这个实际上可以由DBMS自己完成,一次查询两步。首先,为了在相同的内容中准备句子组合,您进行自连接:

SELECT a.content_id,
a.sentence_id AS sentence_id_1,
b.sentence_id AS sentence_id_2
FROM tb_content AS a
JOIN tb_content AS b
ON ( a.content_id = b.content_id
AND a.sentence_id <= b.sentence_id )

“<=”将保持相同的句子连接,如“1-1”或“2-2”,但避免双向重复,如“1-2”和“2-1”。接下来,您可以将上述结果与单词结合并计算出现次数。像这样:

SELECT s.content_id,
s.sentence_id_1,
s.sentence_id_2,
c.word,
Count(*) AS jml
FROM (SELECT a.content_id,
a.sentence_id AS sentence_id_1,
b.sentence_id AS sentence_id_2
FROM tb_content AS a
JOIN tb_content AS b
ON ( a.content_id = b.content_id
AND a.sentence_id <= b.sentence_id )) AS s
JOIN tb_word AS c
ON ( s.content_id = c.content_id
AND ( c.sentence_id = s.sentence_id_1
OR c.sentence_id = s.sentence_id_2 ) )
GROUP BY s.content_id,
s.sentence_id_1,
s.sentence_id_2,
c.word
HAVING Count(*) >= 2;

上述查询的结果将为您提供容器、句子 1 和 2、单词以及出现次数(2 次或更多)。您现在需要的只是将结果收集到数组中,正如我所见,您已经知道该怎么做。

如果我误解了你的目标,请告诉我。

关于php - 检查句子是否有相同的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12418300/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com