gpt4 book ai didi

php - 使用 MySQL 检测垃圾邮件发送者

转载 作者:可可西里 更新时间:2023-11-01 07:35:48 25 4
gpt4 key购买 nike

我看到越来越多的用户在我的网站上注册,只是为了向其他用户发送重复的垃圾邮件。我添加了一些服务器端代码来检测具有以下 mysql 查询的重复消息:

  SELECT count(content) as msgs_sent 
FROM messages
WHERE sender_id = '.$sender_id.'
GROUP BY content having count(content) > 10

查询运行良好,但现在他们通过更改消息中的一些字符来解决这个问题。有没有一种方法可以用 MySQL 检测到这一点,或者我是否需要查看从 MySQL 返回的每个分组,然后使用 PHP 来确定相似性百分比?

有什么想法或建议吗?

最佳答案

全文匹配

您可以考虑实现类似于 MATCH 示例的东西 here :

mysql> SELECT id, body, MATCH (title,body) AGAINST
-> ('Security implications of running MySQL as root') AS score
-> FROM articles WHERE MATCH (title,body) AGAINST
-> ('Security implications of running MySQL as root');
+----+-------------------------------------+-----------------+
| id | body | score |
+----+-------------------------------------+-----------------+
| 4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 |
| 6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------+-----------------+
2 rows in set (0.00 sec)

所以对于你的例子,也许:

SELECT id, MATCH (content) AGAINST ('your string') AS score
FROM messages
WHERE MATCH (content) AGAINST ('your string')
AND score > 1;

请注意,要使用这些函数,您的 content 列需要是一个 FULLTEXT 索引。

这个例子中的score是什么?

这是一个相关值。它是通过下面描述的过程计算出来的:

Every correct word in the collection and in the query is weighted according to its significance in the collection or query. Consequently, a word that is present in many documents has a lower weight (and may even have a zero weight), because it has lower semantic value in this particular collection. Conversely, if the word is rare, it receives a higher weight. The weights of the words are combined to compute the relevance of the row.

来自documentation页面。

关于php - 使用 MySQL 检测垃圾邮件发送者,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9287061/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com