gpt4 book ai didi

algorithm - 跟踪电影和搜索频率的高效程序?

转载 作者:塔克拉玛干 更新时间:2023-11-03 05:56:42 24 4
gpt4 key购买 nike

在准备考试的过程中,我遇到了这个问题。

A website streams movies to customers’ TVs or other devices. Movies are in one of several genres such as action, drama, mystery, etc. Every movie is in exactly one genre (so that if a movie is an action movie as well as a comedy, it is in a genre called “action-Comedy”). The site has around 10 million customers, and around 25,000 movies, but both are growing rapidly. The site wants to keep track of the most popular movies streamed. You have been hired as the lead engineer to develop a tracking program.

i) Every time a movie is streamed to a customer, its name (e.g. “Harold and Kumar: Escape from Guantanamo Bay”) and genre (“Comedy”) is sent to your program so it can update the data structures it maintains.

(Assume your program can get the current year with a call to an appropriate Java class, in O(1) time.)

ii) Also, every once in a while, customers want to know what were the top k most streamed movies in genre g in year y. (If y is the current year, then accounting is done up to the current date.) For example, what were the top 10 most streamed comedy movies in 2010? Here k = 10, g=”comeday” and y = 2010. This query is sent to your program which should output the top k movie names.

Describe the data structures and algorithms used to implement both requirements. For (i), analyze the big O running time to update the data structures, and for (ii) the big O running time to output the top k streamed movies.

我的想法是创建一个哈希表,将每部新电影添加到链表中哈希表中各自的类型。至于第二部分,我唯一的想法是保持链表排序,但这似乎太昂贵了。什么是更好的选择?

最佳答案

我使用堆来跟踪类的前 k 个对象(k 固定)。您可以在任何 CS 文本中找到此数据结构的详细信息,但基本上它是一个二叉树,其中每个节点都小于其任一子节点。主要操作,我们称之为 reheap(node)假设 node 的两个 child 是堆,比较node与其两个 child 中较小的一个,必要时进行交换,并递归调用 reheap对于修改后的 child 。该类需要重载 operator<或为此定义的等效项。

在任何时间点,堆都包含前 k 个对象,其中最小的对象位于堆的顶部。当一个比堆顶大的新对象到达时,它会替换堆上的那个对象,然后 reheap叫做。如果已经在堆上的对象变得比其较小的 child 大,这也可能发生在顶部节点以外的节点。如果已经在堆上的对象变得小于其父对象(在您描述的情况下可能不会发生),则会发生另一种类型的更新。在这里它与它的 parent 交换,然后我们递归地与祖 parent 等进行比较。

所有这些更新的复杂度都是 O(log(k))。如果需要输出自上而下排序的堆,同样的结构在时间上效果很好O(k日志(k))。 (此过程称为堆排序)。

由于交换对象可能很昂贵,我通常将对象保存在某个固定数组中,并将堆实现为数组,A , 的指针,其中 A[i] 的 child 是 A[2i+1]A[2i+2] .

关于algorithm - 跟踪电影和搜索频率的高效程序?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34342277/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com