gpt4 book ai didi

sql - 包含在源系统中定期更新的信息的事实表

转载 作者:行者123 更新时间:2023-12-01 15:44:46 24 4
gpt4 key购买 nike

我正在构建一个维度数据仓库,并学习如何从我的仓库中的源系统为我的各种业务流程建模。

我目前正在将我们数据仓库中源系统的“投标”(工作投标)建模为包含以下信息的事实表:

  • 出价金额
  • 预计收入
  • 销售人员
  • 投标状态(有效、待定、拒绝等)
  • 等等

问题是出价(或我尝试建模的大多数其他过程)可以经历各种状态,并在源系统中的任何给定时刻更新其信息。根据 Ralph Kimball 的说法,事实表只有在被视为“累积快照”时才应更新,而且我确信并非所有这些过程都会被下面的定义视为“累积快照”。

应如何根据 Kimball 小组的建议在数据仓库中为这些类型的流程建模?此外,哪种类型的事实表适用于投标(考虑到我在上面概述的事实)?

摘自 http://www.kimballgroup.com/2008/11/fact-tables/

The transaction grain corresponds to a measurement taken at a single instant. The grocery store beep is a transaction grain. The measured facts are valid only for that instant and for that event. The next measurement event could happen one millisecond later or next month or never. Thus, transaction grain fact tables are unpredictably sparse or dense. We have no guarantee that all the possible foreign keys will be represented. Transaction grain fact tables can be enormous, with the largest containing many billions of records.

The periodic snapshot grain corresponds to a predefined span of time, often a financial reporting period. Figure 1 illustrates a monthly account periodic snapshot. The measured facts summarize activity during or at the end of the time span. The periodic snapshot grain carries a powerful guarantee that all of the reporting entities (such as the bank account in Figure 1) will appear in each snapshot, even if there is no activity. The periodic snapshot is predictably dense, and applications can rely on combinations of keys always being present. Periodic snapshot fact tables can also get large. A bank with 20 million accounts and a 10-year history would have 2.4 billion records in the monthly account periodic snapshot!

The accumulating snapshot fact table corresponds to a predictable process that has a well-defined beginning and end. Order processing, claims processing, service call resolution and college admissions are typical candidates. The grain of an accumulating snapshot for order processing, for example, is usually the line item on the order. Notice in Figure 1 that there are multiple dates representing the standard scenario that an order undergoes. Accumulating snapshot records are revisited and overwritten as the process progresses through its steps from beginning to end. Accumulating snapshot fact tables generally are much smaller than the other two types because of this overwriting strategy.

最佳答案

就像评论中提到的那样,变更数据捕获是一个相当通用的术语,表示“我如何处理数据实体随时间的变化”,并且有整本书(以及无数的帖子和文章)。

无论任何陈述似乎暗示一个明确的黑白或总是这样的答案,真正的答案,像往常一样,是“它取决于” - 在你的情况下,取决于什么 Cereal 您需要特定的事实表。

如果您的数据以不可预测的方式或非常频繁地发生变化,那么实现 Kimball 版本的累积快照(想象有多少“里程碑”日期列,等等,你可能最终需要)。

因此,如果您愿意,您可以决定让您的事实表成为事务性事实表而不是快照,其中事实键为(出价键、时间戳),然后在您的应用层(无论是 View 、mview、实际应用程序还是其他),您可以确保给定查询仅获取每个出价的最新版本(请注意,这可以被认为是一种虚拟的累积快照)。如果您发现您不需要以前的版本(每个 Bid 的历史),您可以有一个修剪它们的例程(即删除它们或将它们移动到其他地方)。

或者,您只能允许在事实 (Bid) 处于最终状态时将其添加,但是您可能会有明显的延迟,因为新的(可更新的)Bid 不会进入事实表一段时间。

无论哪种方式,有几种可靠且经过验证的技术可以处理此问题 - 您只需清楚地确定业务需求并进行相应的设计即可。

祝你好运!

关于sql - 包含在源系统中定期更新的信息的事实表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26351343/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com