gpt4 book ai didi

data-warehouse - 什么是 Dim,什么是 Fact?

转载 作者:行者123 更新时间:2023-12-04 06:43:16 25 4
gpt4 key购买 nike

我有一个应用程序,我知道它可以制作一个很棒的多维数据集,并且比标准的平面 Reporting Services 报表更有用。我们即将与一名顾问一起进入 BI 领域,但我想在此之前先试一试,主要是因为我知道我们将要做什么。

该应用程序跟踪全国疗养院的调查。它们可以是年度调查、投诉调查或其他几种类型的调查,它们具有与给出的标签相关的处罚,并具有与之相关的文档。

我想做的是想出一种方法,让我们能够利用我们拥有的数据 - 佛罗里达州 6 月份有多少标签?有多少设施按时交付文件?与去年相比,今年第一季度发生了多少次年度(惊喜)调查?

我将模式包括在内,希望有人能够告诉我不仅什么是模糊的,什么是事实,还有什么数据去哪里了。我认为这将是一个很好的开始。

任何事情都会很有帮助。我正在尝试建立一个小型数据集市,同时我正在研究 Kimball 的数据仓库生命周期工具包。

谢谢!男@

实体表 - 我们所有设施的列表:主键是表示建筑物的五个字母代码

CREATE TABLE [dbo].[Entity](
[entID] [varchar](10) NOT NULL,
[entShortName] [varchar](150) NULL,
[entNumericID] [int] NOT NULL,
[orgID] [int] NOT NULL,
[regionID] [int] NOT NULL,
[portID] [int] NOT NULL,
[busTypeID] [int] NOT NULL,
[adpID] [varchar](50) NULL,
[eHealthDataID] [varchar](50) NULL,
[updateDate] [datetime] NULL CONSTRAINT [DF_Entity_updateDate] DEFAULT (getdate()),
[powProID] [int] NULL,
[regionReportingID] [int] NULL,
[regionPresEmail] [varchar](300) NULL,
[regionClinDirEmail] [varchar](300) NULL,
CONSTRAINT [PK_EntityNEW] PRIMARY KEY CLUSTERED
(
[entID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PRIMARY]
) ON [PRIMARY]

调查主要

CREATE TABLE [dbo].[surveyMain](
[surveyID] [int] IDENTITY(1,1) NOT NULL,
[surveyDateFac] AS (([facility]+'-')+CONVERT([varchar],[surveyDate],(101))),
[surveyDate] [datetime] NOT NULL,
[surveyType] [int] NOT NULL,
[surveyBy] [int] NULL,
[facility] [varchar](10) NOT NULL,
[originalSurvey] [int] NULL,
[exitDate] [datetime] NULL,
[dpnaDate] AS (dateadd(month,(3),[exitDate])),
[clearedTags] [varchar](1) NULL,
[substantiated] [varchar](1) NULL,
[firstRevisit] [int] NULL,
[secondRevisit] [int] NULL,
[thirdRevisit] [int] NULL,
[fourthRevisit] [int] NULL,
[updated] [datetime] NULL CONSTRAINT [DF_surveyMain_updated] DEFAULT (getdate()),
CONSTRAINT [PK_tagSurvey] PRIMARY KEY CLUSTERED
(
[surveyID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]

调查类型:

CREATE TABLE [dbo].[surveyTypes](
[surveyTypeID] [int] IDENTITY(1,1) NOT NULL,
[surveyTypeDesc] [varchar](100) NOT NULL,
CONSTRAINT [PK_surveyTypes] PRIMARY KEY CLUSTERED
(
[surveyTypeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

调查文件

CREATE TABLE [dbo].[surveyFiles](
[surveyFileID] [int] IDENTITY(1,1) NOT NULL,
[surveyID] [int] NOT NULL,
[surveyFilesTypeID] [int] NOT NULL,
[documentDate] [datetime] NOT NULL,
[responseDate] [datetime] NULL,
[receiptDate] [datetime] NULL,
[dateCertain] [datetime] NULL,
[fileName] [varchar](250) NULL,
[fileUpload] [image] NULL,
[fileDesc] [varchar](100) NULL,
[updated] [datetime] NOT NULL CONSTRAINT [DF_surveyFiles_updated] DEFAULT (getdate()),
CONSTRAINT [PK_surveyFiles] PRIMARY KEY CLUSTERED
(
[surveyFileID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

调查罚款

CREATE TABLE [dbo].[surveyFines](
[surveyFinesID] [int] IDENTITY(1,1) NOT NULL,
[surveyID] [int] NULL,
[surveyFinesTypeID] [int] NULL,
[dateRecommended] [datetime] NULL,
[dateImposed] [datetime] NULL,
[totalFineAmt] [varchar](100) NULL,
[wasImposed] [varchar](3) NULL,
[dateCleared] [datetime] NULL,
[comments] [varchar](500) NULL,
[updated] [datetime] NOT NULL CONSTRAINT [DF_surveyFines_updated] DEFAULT (getdate()),
CONSTRAINT [PK_surveyFines] PRIMARY KEY CLUSTERED
(
[surveyFinesID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PRIMARY]
) ON [PRIMARY]

调查标签

CREATE TABLE [dbo].[surveyTags](
[seq] [int] IDENTITY(1,1) NOT NULL,
[surveyID] [int] NOT NULL,
[tagDescID] [int] NOT NULL,
[tagStatus] [int] NULL,
[scopesev] [varchar](5) NOT NULL,
[comments] [varchar](1000) NULL,
[clearedDate] [datetime] NULL,
[updated] [datetime] NULL CONSTRAINT [DF_surveyTags_updated] DEFAULT (getdate()),
CONSTRAINT [PK_tagMain] PRIMARY KEY CLUSTERED
(
[seq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]

最佳答案

What I'd like to do is come up with a way that will allow us to leverage the data we have - how many tags in Florida for the month of June? How many facilities were on time delivering their documentation? How many annual(surprise) surveys happened in the 1st quarter of this year compared to last year?

维度是一个测量范围。测量范围可以是连续的,如日期,也可以是离散的,如设施。在您的问题中,维度分别是设施和日期、日期/时间和日期。

您可以回答“佛罗里达州 6 月份有多少标签?”这个问题的唯一方法。是将标签与设施相关联,将标签与日期相关联。

您可以回答“有多少设施按时交付文件?”这个问题的唯一方法?是将文档交付与设施和到期日期与设施相关联。

对于您希望数据仓库回答的其他问题或查询,您应该遵循相同的分析过程。

事实是实体或对象。标签是事实。文档交付是事实。一旦加载,事实在数据仓库中几乎总是不可变的。

至于您的架构,我必须对其进行更多研究才能给出具体建议,但总的来说,您希望使用 star schema 。星星的中心是你的事实、实体和对象。构成星形点的表是您的维度表。

您需要做的第一件事是将事实和维度分开。您的任何实体表都不应包含日期、位置代码或您确定为维度的任何其他内容。但是,事实表将包含日期表、位置表或其他维度表的外键。

您可能还需要汇总表。汇总表包含与事实表相同的列,并添加了一个或多个跨不同维度的总和。例如,问题“佛罗里达州 6 月份有多少标签?”如果您已经有了 2010 年 6 月当月(或每一天)佛罗里达州(或者更准确地说是佛罗里达州的每个设施)的标签总和,则可以更快地得到答案。

求和的时间段取决于您预期的查询组合。在您的数据仓库中,一天可能太短了。换句话说,在 SQL 中进行汇总与选择汇总行一样快。

你需要一个 calendar table 也是。日历表会提出诸如“与去年(第一季度)相比,今年第一季度发生了多少年度(意外)调查?”之类的问题。更容易查询。

关于data-warehouse - 什么是 Dim,什么是 Fact?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3189512/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com