gpt4 book ai didi

sql - 查询和/或函数对一个字段上的所有类型的数据进行子集化并在其他字段上进行计算

转载 作者:行者123 更新时间:2023-12-04 15:44:16 26 4
gpt4 key购买 nike

我试图从这个出发:

+------+------+------+------+
| fld1 | fld2 | fld3 | etc… |
+------+------+------+------+
| a | 5 | 1 | |
| b | 5 | 0 | |
| c | 6 | 0 | |
| b | 2 | 5 | |
| b | 1 | 6 | |
| c | 0 | 6 | |
| a | 8 | 9 | |
+------+------+------+------+

到:
+--------+--------+-----------+-----+-----+------+
| Factor | Agg | CalcDate | Sum | Avg | etc… |
+--------+--------+-----------+-----+-----+------+
| fld2 | fld1/a | 8/14/2015 | 13 | 6.5 | |
| fld2 | fld1/b | 8/14/2015 | 8 | 2.7 | |
| fld2 | fld1/c | 8/14/2015 | 6 | 3 | |
| fld3 | fld1/a | 8/14/2015 | 10 | 5 | |
| fld3 | fld1/b | 8/14/2015 | 11 | 3.7 | |
| fld3 | fld1/c | 8/14/2015 | 6 | 3 | |
+--------+--------+-----------+-----+-----+------+

笔记:
  • 显然,这个数据被简化了很多。
  • 我有很多字段需要为
  • 执行此操作
  • 我在这里包含了简单的聚合计算,因此有人帮助我可能会更容易。详尽的列表是:NaPct、Mean、Sd、Low、Q1、Median、Q3、High、IQR、Kurt、Skew、Obs。其中 NaPct = 为 NULL 的百分比,Sd = 标准差,Q1 = 第 1 个四分位数,Q3 = 第 3 个四分位数,IQR = 四分位数范围,Kurt = 峰度,Skew = 偏度,Obs = 不为 NULL 的观察数。
  • 实际上,在上面的第二个表中,因子字段将是 FactorID,Agg 将是 AggID,CalcDate 将是 CalcDateID,但为了说明目的,我将实际值放在那里。不过,问题/答案应该无关紧要。
  • 速度非常重要,因为我有 1305 个字段和几个聚合要在工作日开始之前进行计算。
  • 仅使用 MS Access、SQL 和 VBA 回答。对不起业务需求。也就是说,一个仅适用于 MS Access 的 SQL 答案将是最简单的。
  • 下面是使用自定义域函数 (DCalcForQueries) 和我构建的支持函数的代码,这些函数返回每个字段的一个计算聚合值和选定的聚合。好吧,不是我想要的。也许该代码可用于我想要的,也许不是。尽管如此,它有我想要的计算,希望会有所帮助。
  • 消息框正是我在进行 alpha 测试时调试的方式:没有必要。
  • 要使用该代码,请将所有代码放入 VBA 模块中,将表“tbl_DatedModel_2015_0702_0”更改为 MS Access 中的表,将字段“Rk-IU Mkt Cap”更改为表中的字段,然后运行 ​​TestIT( ) 子,您应该在“立即”窗口中获得计算值。
  • 不要太担心计算。我会处理的。我只需要知道以允许我想要的计算的方式从上面的第一个表到上面的第二个表的最佳方法是什么。谢谢!
    Sub TestIt()
    Dim x
    Set x = GetOrOpenAndGetExcel

    Dim rst As DAO.Recordset
    Dim sSql As String
    Dim q As String
    q = VBA.Chr(34)
    sSql = "SELECT " & _
    "DCalcForQueries(" & q & "NaPct" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS NaPct ," & _
    "DCalcForQueries(" & q & "Mean" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Mean ," & _
    "DCalcForQueries(" & q & "Sd" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Sd ," & _
    "DCalcForQueries(" & q & "Low" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Low ," & _
    "DCalcForQueries(" & q & "Q1" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Q1 ," & _
    "DCalcForQueries(" & q & "Median" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Median ," & _
    "DCalcForQueries(" & q & "Q3" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Q3 ," & _
    "DCalcForQueries(" & q & "High" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS High ," & _
    "DCalcForQueries(" & q & "IQR" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS IQR ," & _
    "DCalcForQueries(" & q & "Kurt" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Kurt ," & _
    "DCalcForQueries(" & q & "Skew" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Skew ," & _
    "DCalcForQueries(" & q & "Obs" & q & ", " & q & "tbl_DatedModel_2015_0702_0" & q & ", " & q & "Rk-IU Mkt Cap" & q & ", " & q & "[Rk-IU Mkt Cap] IS NOT NULL AND [GICS Sector] = 'Consumer Discretionary'" & q & ") AS Obs " & _
    "FROM tbl_DatedModel_2015_0702_0;"
    Debug.Print sSql
    Set rst = CurrentDb.OpenRecordset(sSql, dbOpenSnapshot)
    rst.MoveFirst

    Debug.Print rst.RecordCount
    Debug.Print rst.Fields("NaPct")
    Debug.Print rst.Fields("Mean")
    Debug.Print rst.Fields("Sd")
    Debug.Print rst.Fields("Low")
    Debug.Print rst.Fields("Q1")
    Debug.Print rst.Fields("Median")
    Debug.Print rst.Fields("Q3")
    Debug.Print rst.Fields("High")
    Debug.Print rst.Fields("IQR")
    Debug.Print rst.Fields("Kurt")
    Debug.Print rst.Fields("Skew")
    Debug.Print rst.Fields("Obs")


    End Sub
    Public Function DCalcForQueries(sCalc As String, Optional sTbl As String = "", Optional sMainFld As String = "", Optional sWhereClause As String = "", Optional k As Double) As Variant

    Dim dblData() As Double
    Dim oxl As Object
    On Error Resume Next
    Set oxl = GetObject(, "Excel.Application")
    If Err.Number <> 0 Then
    MsgBox "Excel object must be openned by the calling sub of DCalcForQueries so it isn't opened over and over, which is very slow"
    GoTo cleanup
    End If

    Dim x As Integer

    Dim aV() As Variant
    Dim tmp
    Dim lObsCnt As Long
    Dim lNaCnt As Long
    Dim i As Long
    Dim vTmp As Variant
    Dim lTtl As Long
    Dim bDoCalc As Boolean

    aV = a2dvGetSubsetFromQuery(sTbl, sMainFld, sWhereClause, "Numeric")
    If aV(0, 0) = "Not Numeric" Then
    MsgBox "Data returned by query was not numeric. Press OK to Stop and debug."
    Stop
    End If

    If sCalc = "Percentile" Or sCalc = "Q1" Or sCalc = "Q2" Or sCalc = "Q3" Or sCalc = "Q4" Then
    DCalcForQueries = oxl.WorksheetFunction.Percentile_Exc(aV, k)
    ElseIf sCalc = "Median" Then
    DCalcForQueries = oxl.WorksheetFunction.Median(aV)
    ElseIf sCalc = "Kurt" Or sCalc = "Kurt" Then
    DCalcForQueries = oxl.WorksheetFunction.Kurt(aV)
    ElseIf sCalc = "Minimum" Or sCalc = "Low" Then
    DCalcForQueries = oxl.WorksheetFunction.Min(aV)
    ElseIf sCalc = "Maximum" Or sCalc = "High" Then
    DCalcForQueries = oxl.WorksheetFunction.Max(aV)
    ElseIf sCalc = "IQR" Then
    DCalcForQueries = oxl.WorksheetFunction.Quartile_Exc(aV, 3) - oxl.WorksheetFunction.Quartile_Exc(aV, 1)
    ElseIf sCalc = "Obs" Then
    lObsCnt = 0
    For Each tmp In aV
    If Not IsNull(tmp) Then
    lObsCnt = lObsCnt + 1
    End If
    Next
    DCalcForQueries = lObsCnt
    ElseIf sCalc = "%NA" Or sCalc = "PctNa" Or sCalc = "NaPct" Or sCalc = "%Null" Or sCalc = "PctNull" Then
    lNaCnt = 0
    lTtl = UBound(aV, 2) + 1
    For Each tmp In aV
    If IsNull(tmp) Then
    lNaCnt = lNaCnt + 1
    End If
    Next
    DCalcForQueries = (lNaCnt / lTtl) * 100
    ElseIf sCalc = "Skewness" Or sCalc = "Skew" Then
    DCalcForQueries = oxl.WorksheetFunction.Skew(aV)
    ElseIf sCalc = "StDev" Or sCalc = "Sd" Then
    DCalcForQueries = oxl.WorksheetFunction.StDev_S(aV)
    ElseIf sCalc = "Mean" Then
    DCalcForQueries = oxl.WorksheetFunction.Average(aV)
    Else
    MsgBox "sCalc parameter not recognized: " & sCalc
    End If

    cleanup:


    End Function

    Function a2dvGetSubsetFromQuery(sTbl As String, sMainFld As String, sWhereClause As String, sTest As String) As Variant()
    'sTest can be "Numeric" or "None" ...will implement more as needed
    Dim iFieldType As Integer
    Dim rst As DAO.Recordset
    Dim db As Database
    Set db = CurrentDb
    Dim sMainFldFull As String
    Dim sSubSetFldFull As String
    Dim sSql As String

    sMainFldFull = "[" & sMainFld & "]"
    sSubSetFldFull = ""
    sSql = ""

    sSql = "SELECT " & sMainFldFull & " FROM " & sTbl
    If Len(sWhereClause) > 0 Then
    sSql = sSql & " WHERE " & sWhereClause
    End If

    Set rst = db.OpenRecordset(sSql, dbOpenSnapshot)

    'make sure the data is the right type

    iFieldType = rst(sMainFld).Type

    If sTest = "Numeric" Then
    If iFieldType = dbByte Or _
    iFieldType = dbInteger Or _
    iFieldType = dbLong Or _
    iFieldType = dbCurrency Or _
    iFieldType = dbSingle Or _
    iFieldType = dbDouble _
    Then
    rst.MoveLast
    rst.MoveFirst

    a2dvGetSubsetFromQuery = rst.GetRows(rst.RecordCount)

    Else
    Dim aV(0 To 1, 0 To 1) As Variant
    aV(0, 0) = "Not Numeric"
    a2dvGetSubsetFromQuery = aV

    End If
    ElseIf sTest = "None" Then
    'don't do any testing
    rst.MoveLast
    rst.MoveFirst

    a2dvGetSubsetFromQuery = rst.GetRows(rst.RecordCount)
    Else
    MsgBox "Test type (sTest) can only be 'None' or 'Numeric'. It was: " & sTest
    Stop
    End If

    cleanup:
    rst.Close
    Set rst = Nothing
    End Function
    Public Function GetOrOpenAndGetExcel() As Object
    'if excel is open it will return the excel object
    'if excel is not open it will open excel and return the excel object
    On Error GoTo 0
    On Error Resume Next
    Set GetOrOpenAndGetExcel = GetObject(, "Excel.Application")

    If Err.Number <> 0 Then
    Set GetOrOpenAndGetExcel = CreateObject("Excel.Application")
    End If

    On Error GoTo 0
    End Function

  • Edit1:我上面提供的代码只是为了说明我的尝试和计算。我很确定这与一个好的答案没有直接关系,但我不是 100% 确定。如果我使用上面的内容,它一次会产生一条记录,我必须一次添加(INSERT INTO)每条记录,这会很慢。我的计划是构建结果的二维数组并使用该二维数组批量添加记录,但被告知如果不循环遍历数组一次添加每条记录,您将无法做到这一点,这将击败目的。我很确定一个解决方案,包括循环遍历 fld1 类型或一个带有子查询的查询,可以一步完成,是应该采取的方向。到目前为止我为优化所做的工作:我提取了 Excel 对象的创建,因此仅在 TestIt() Sub 中创建了一次。

    Edit2:我有 1305 个字段要计算。它们并不都在同一张 table 上;但是,出于这个问题的目的,我只需要一个可以同时处理多个字段的有效答案。 IE。您的答案可以假设所有字段都在同一个表中,为简单起见,您的答案只能包含 2 个字段,我可以从那里扩展它。在上面的代码中,我计算了一个字段“Rk-IU Mkt Cap”的 12 个指标,聚合了一种类型,“Consumer Discretionary”([GICS Sector] = 'Consumer Discretionary'”)。我所拥有的不是我所追求的.

    最佳答案

    像这样的工作,只使用纯 tSql 吗?

    1:创建表并插入一些示例数据

    CREATE TABLE [dbo].[FLD](
    [fld1] [nvarchar](2) NOT NULL,
    [fld2] [int] NULL,
    [fld3] [int] NULL
    ) ON [PRIMARY]

    GO

    INSERT FLD VALUES ('a', 5, 9)
    INSERT FLD VALUES ('b', 1, 8)
    INSERT FLD VALUES ('a', 3, 7)

    2:使用嵌套 UNPIVOT创造因素
    SELECT t.factor,t.val + '/' + t.v  AS Agg, SUM(value) AS [Sum], AVG(value) AS [AVG]
    FROM
    (
    SELECT * from
    (
    select * from FLD f
    UNPIVOT
    (
    v
    for val in (fld1)
    ) piv
    ) f
    UNPIVOT
    (
    value
    for factor in (fld2, fld3)
    ) s
    ) t
    group by t.v, t.factor, t.val

    关于sql - 查询和/或函数对一个字段上的所有类型的数据进行子集化并在其他字段上进行计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32012370/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com