gpt4 book ai didi

sql - 从大型数据集中随机采样

转载 作者:行者123 更新时间:2023-12-03 00:49:48 26 4
gpt4 key购买 nike

我从一个大型数据库中提取了研究人群。为了进行比较,我想选择一个具有相似特征的对照组。我想要匹配的两个标准是年龄和性别。为我提供用于匹配目的的数字的查询是

select sex, age/10 as decades,COUNT(*) as counts
from
(
select distinct m.patid
,m.sex,DATEPART(year,min(c.admitdate)) -m.yrdob as Age
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob
)x group by sex, Age/10

结果集看起来像

enter image description here

这个时代的几十年列由表达式给出

(DATEPART(year,min(c.admitdate)) -m.yrdob)/10 

这用于使用整数除法查找 20-29、30-39 等年龄段的人。例如,我想从更大的数据集中选择 507 名 20 多岁的女性。查找较大数据集特征的查询是

select distinct m.patid
,m.sex
,(DATEPART(year,min(c.admitdate)) -m.yrdob)/10 as decades
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob

编辑:第二个查询的结果 enter image description here

因此,我需要第二个查询中的几十年列的总和等于第一个查询中的counts。我尝试过的(并返回零结果)如下。我需要做什么才能符合这些年龄?

运行但不返回结果的查询:

select x.PATID--,x.sex,x.decades,y.counts
from
(

select distinct m.patid
,m.sex
,(DATEPART(year,min(c.admitdate)) -m.yrdob)/10 as decades
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob
) as x
inner join
(

select sex, age/10 as decades,COUNT(*) as counts
from
(
select distinct m.patid
,m.sex,DATEPART(year,min(c.admitdate)) -m.yrdob as Age
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob
)x group by sex, Age/10
) as y on x.sex=y.sex and x.decades=y.decades
group by y.counts,x.PATID,x.sex,y.sex
having SUM(x.decades)=y.counts and x.sex=y.sex

最佳答案

select
T1.sex,
T1.decades,
T1.counts,
T2.patid

from (

select
sex,
age/10 as decades,
COUNT(*) as counts
from (

select m.patid,
m.sex,
DATEPART(year,min(c.admitdate)) -m.yrdob as Age
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob
)x
group by sex, Age/10
) as T1
join (
--right here is where the random sampling occurs
SELECT TOP 50--this is the total number of peolpe in our dataset
patid
,sex
,decades

from (
select m.patid,
m.sex,
(DATEPART(year,min(c.admitdate)) -m.yrdob)/10 as decades
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex, m.yrdob

) T2
order by NEWID()
) as T2
on T2.sex = T1.sex
and T2.decades = T1.decades

编辑:我发布了另一个与此类似的问题,其中我发现我的结果实际上不是随机的,但它们只是前 N 个结果。我在最外面的查询中通过 newid() 进行了排序,所做的只是围绕完全相同的结果集进行洗牌。从现已关闭的问题中,我发现我需要在上述查询的注释行中使用 TOP 关键字以及 order by newid()

关于sql - 从大型数据集中随机采样,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13627486/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com