gpt4 book ai didi

sql - 对数据库的密集查询使加载时间变得疯狂

转载 作者:塔克拉玛干 更新时间:2023-11-03 03:59:39 25 4
gpt4 key购买 nike

我正在尝试为我的推文应用程序实现一个基于内容的推荐系统,我想我已经做到了。问题是我的解决方案是 DB 密集型的,以至于加载时间太长。所以我来这里寻求帮助。在下一节中,我将发布我将继续解释的算法。

def candidates2(user)
@follower_tweet_string = "" ## storing all the text from all the tweets from all the followers that a user has
@rest_of_users_strings ## storing all the text from all the tweets a user, that the current user is not following, has.
scoreHash = Hash.new ## a score hash where the score between the similarities found by the TfIdSimilarity gem are kept
@rezultat = [] ## the array of users returned
@users = User.all ## all the users
@rest_of_users = [] ## all the users that the current user is not following
@following = user.following + Array(user) ## all the user the current user is following + the user

@following.each do |followee|
@tweets = followee.feed ## feed is a method for requesting all the tweets of that person
@tweets.each do |tweet|
@follower_tweet_string = @follower_tweet_string + tweet.content ## getting all the text from all the tweets of all the followers
end
end

@rest_of_users = @users - @following ## finding out all the users that the user is not following

document1 = TfIdfSimilarity::Document.new(@follower_tweet_string)
corpus = [document1]

@rest_of_users.each do |person|
@tweets = person.feed ## getting all the tweets of the user
@tweets.each do |tweet|
@follower_tweet_string = @follower_tweet_string + tweet.content ## getting all the text from all the tweets that a user has(a user that isn't followed by the current user)
end

##calculating the score
document2 = TfIdfSimilarity::Document.new(@follower_tweet_string)
corpus = corpus + Array(document2)

model = TfIdfSimilarity::TfIdfModel.new(corpus)
matrix = model.similarity_matrix
scoreHash[person.email] = matrix[model.document_index(document1), model.document_index(document2)]
corpus = corpus - Array(document2)
## stop calculating the score

end

sortedHash = Hash[scoreHash.sort_by{|email, score| score}.reverse[0..4]] ## sorting the hash

@rest_of_users.each do |rank|
if sortedHash[rank.email] then
@rezultat = @rezultat + Array(rank) ## getting the resulting users
end
end


@rezultat ## returning the resulting users
end

算法可以在here中找到在第 6 页,第 3.2 章,Content-based-Recommender(20 行左右的解释)。

我的算法的主要问题是我必须获取所有未关注的用户,然后获取他们的所有推文,然后应用该算法。这是非常非常密集的数据库,这太疯狂了。我不能那样做...关于如何改进它的任何想法?

最佳答案

您应该将生成建议与显示建议分开。

也就是说,您有一个批处理作业来处理推文并生成推荐,然后将它们存储在数据库中。该作业定期运行。

另外,您有一个 Web 界面,可以查询数据库中的当前推荐,然后显示它们。

现在加载时间很快。 Web 响应时间很快。您的性能问题现在显示为您运行批处理作业的频率。在这种情况下,延迟问题不大,并且可以通过运行并行作业等技术更轻松地解决。

关于sql - 对数据库的密集查询使加载时间变得疯狂,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28808578/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com