- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我尝试使用广度优先算法来检索所选用户所属的整个连接集群,方法如下:
CREATE PROCEDURE Breadth_First (@StartNode varchar(50), @LinkStrength decimal(10,7) = 0.1, @EndNode varchar(50) = NULL)
AS
BEGIN
-- Automatically rollback the transaction if something goes wrong.
SET XACT_ABORT ON
BEGIN TRAN
-- Increase performance and do not intefere with the results.
SET NOCOUNT ON;
-- Create a temporary table for storing the discovered nodes as the algorithm runs
CREATE TABLE #Discovered
(
DiscoveredUser varchar(50) NOT NULL, -- The Node Id
Predecessor varchar(50) NULL, -- The node we came from to get to this node.
LinkStrength decimal(10,7) NULL, -- positive - from predecessor to DiscoveredUser, negative - from DiscoveredUser to predecessor
OrderDiscovered int -- The order in which the nodes were discovered.
)
-- Initially, only the start node is discovered.
INSERT INTO #Discovered ( DiscoveredUser, Predecessor, LinkStrength, OrderDiscovered)
VALUES (@StartNode, NULL, NULL, 0)
-- Add all nodes that we can get to from the current set of nodes,
-- that are not already discovered. Run until no more nodes are discovered.
WHILE @@ROWCOUNT > 0
BEGIN
-- If we have found the node we were looking for, abort now.
IF @EndNode IS NOT NULL
IF EXISTS (SELECT TOP 1 1 FROM #Discovered WHERE DiscoveredUser = @EndNode)
BREAK
-- We need to group by ToNode and select one FromNode since multiple
-- edges could lead us to new same node, and we only want to insert it once.
INSERT INTO #Discovered ( DiscoveredUser, Predecessor, LinkStrength, OrderDiscovered)
(SELECT mc.called_party, mc.calling_party, mc.link_strength, d.OrderDiscovered + 1
FROM #Discovered d JOIN monthly_connections mc ON d. DiscoveredUser = mc.calling_party
WHERE mc.called_party NOT IN (SELECT DiscoveredUser From #Discovered) AND mc.link_strength > @LinkStrength
UNION
SELECT mc.calling_party, mc.called_party, mc.link_strength * (-1), d.OrderDiscovered + 1
FROM #Discovered d JOIN monthly_connections mc ON d. DiscoveredUser = mc.called_party
WHERE mc.calling_party NOT IN (SELECT DiscoveredUser FROM #Discovered) AND mc.link_strength > @LinkStrength
)
END;
-- Select the results. We use a recursive common table expression to
-- get the full path from the start node to the current node.
WITH BacktraceCTE(Predecessor, DiscoveredUser, LinkStrength, OrderDiscovered, Path)
AS
(
-- Anchor/base member of the recursion, this selects the start node.
SELECT d.Predecessor, n. DiscoveredUser, d.LinkStrength, d.OrderDiscovered,
CAST(n. DiscoveredUser AS varchar(MAX))
FROM #Discovered d JOIN users n ON d. DiscoveredUser = n. DiscoveredUser
WHERE d. DiscoveredUser = @StartNode
UNION ALL
-- Recursive member, select all the nodes which have the previous
-- one as their predecessor. Concat the paths together.
SELECT d.Predecessor, n. DiscoveredUser, d.LinkStrength, d.OrderDiscovered,
CAST(cte.Path + ',' + CAST(n. DiscoveredUser as varchar(30)) as varchar(MAX))
FROM #Discovered d JOIN BacktraceCTE cte ON d.Predecessor = cte. DiscoveredUser
JOIN users n ON d. DiscoveredUser = n. DiscoveredUser
)
SELECT Predecessor, DiscoveredUser, LinkStrength, OrderDiscovered, Path FROM BacktraceCTE
WHERE DiscoveredUser = @EndNode OR @EndNode IS NULL -- This kind of where clause can potentially produce
ORDER BY OrderDiscovered -- a bad execution plan, but I use it for simplicity here.
DROP TABLE #Discovered
COMMIT TRAN
RETURN 0
END
我当前正在分析的图表(社交网络)有 28M 连接,并且在不忽略弱连接(使用 @LinkStrength 设置阈值)的情况下,执行运行了很长时间(到目前为止,我还没有得到任何结果,并将尝试让它运行过夜)。
无论如何,下一步是计算两个用户(大约有 300 万用户)之间的最短(最强)链接。我正在考虑使用 Djikstra 算法,但不确定是否可以在我当前使用的 PC(Quad CPU 2.66 GHz,4GB RAM)上分析此类网络,并且数据存储在 MS SQL Server 2008 数据库中。
总而言之,我希望获得以下问题的一些答案/建议:
谢谢!
最佳答案
首先,使用索引。
其次,您需要减少内存需求。这意味着首先为 VARCHAR(50) 提供一个短别名,例如 4 字节而不是 50 字节的 int。这将使您的速度提高 10 倍。
declare @tmpPeople table(
ixPerson int identity primary key,
UserNodeID varchar(50),
unique(UserNodeID, ix) -- this creates an index
)
Insert @tmpPeople(UserNodeID) select UserNodeID from NormalPeopleTable
declare @relationships table(
ixParent int,
ixChild int,
unique(ixParent, ixChild),
unique(ixChild, ixParent)
)
insert @relationships(ixParent, ixChild)
select distinct p.ixPerson, c.ixPerson
from NormalRelationshipsTable nr
inner join @tmpPeople p on p.UserNodeID = nr.ParentUserNodeID
inner join @tmpPeople c on c.UserNodeID = nr.ChildUserNodeID
-- OK now got a copy of all relationships, but it is a fraction of the size
-- because we are using integers for the keys.
-- if we need to we can insert the reverse relationships too.
您需要编写一个执行您想要的操作的查询,而不是“通用”的查询。
如果你想找到两个节点之间的最短距离,可以通过从两端同时搜索来减少搜索时间。
declare @p1 table(
ix int identity primary key,
ixParent int,
ixChild int,
nDeep int,
-- Need indexes
unique(ixParent, ixChild, nDeep, ix),
unique(ixChild, ixParent, nDeep, ix)
)
-- That's now indexed both ways.
-- If you only need one, you can comment the other out.
-- define @p2 the same
insert @p1 (ixParent, ixChild, nDeep) select @ixUserFrom, @ixUserFrom, 0
insert @p2 ..... @ixUserTo, @ixUserTo, 0
-- Breadth first goes like this.
-- Just keep repeating it till you get as far as you want.
insert @p1 (ixParent, ixChild, nDeep)
select
p1.ixChild, r.ixChild, p1.nDeep+1
from @p1 p1 inner join @relationships r on r.ixParent = p1.ixChild
-- may want to exclude those already in the table
where not exists (
select 1 from @p1 p1
where p1.ixParent = p.ixChild and p1.ixChild = r.ixChild
)
对于“从 Alice 到 Bob 的距离”,您并行执行两次搜索,并在 Alice 的搜索包含 Bob 的搜索中也包含的任何人时停止。这还会将您的时间缩短 n^2 倍,其中 n 是平均连接数。
如果深度变得太多,不要忘记停止。
关于sql - 分析大图 - 检索簇并计算最强路径,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4124441/
这个问题在这里已经有了答案: 关闭 12 年前。 Possible Duplicate: C pointer to array/array of pointers disambiguation 在
我是一名优秀的程序员,十分优秀!