gpt4 book ai didi

python - 如何使用pydruid中的ThetaSketchOp函数

转载 作者:太空宇宙 更新时间:2023-11-03 20:39:22 25 4
gpt4 key购买 nike

我正在使用 pydruid 查询 druid 数据库,并希望计算聚合后结果,其中一个聚合为 true,另一个聚合为 False。

我已经能够使用curl 计算聚合后结果,以将JSON 格式查询发布到druid 数据库。

使用 pydruid 我已经能够计算两个聚合组相交的初始聚合和后聚合。我试图找到一种方法来使用 ThetaSketchOp 类来达到我的目的,但到目前为止没有任何成功。

这是我迄今为止在 pydruid 中使用 ThetaSketchOp 类的尝试:

result = query.groupby(
datasource='datasource',
granularity='all',
intervals='2018-06-30/2018-08-30',
filter=(
(filters.Dimension('fruit') == 'apple') |
(filters.Dimension('fruit') == 'orange')
),
aggregations={
'apple': aggregators.filtered(
filters.Dimension('fruit') == 'apple',
aggregators.thetasketch('person')),
'orange': aggregators.filtered(
(filters.Dimension('fruit') == 'orange'),
aggregators.thetasketch('person')),
},
post_aggregations={
'apple_&_orange': postaggregator.ThetaSketchEstimate(
postaggregator.ThetaSketch('apple') &
postaggregator.ThetaSketch('orange')
),
'apple_&_not_orange': postaggregator.ThetaSketchEstimate(
postaggregator.ThetaSketchOp(
fn='not',
fields=[
postaggregator.ThetaSketch('apple'),
postaggregator.ThetaSketch('orange')
],
name='testing'
)
)
}
)

以下是 json 格式的查询,用于查询 druid 数据库时会产生所需的结果:

{
"queryType": "groupBy",
"dataSource": "datasource",
"granularity": "ALL",
"dimensions": [],
"aggregations": [
{
"type" : "filtered",
"filter" : {
"type" : "selector",
"dimension" : "fruit",
"value" : "apple"
},
"aggregator" : {
"type": "thetaSketch", "name": "apple", "fieldName": "person"
}
},
{
"type" : "filtered",
"filter" : {
"type" : "selector",
"dimension" : "fruit",
"value" : "orange"
},
"aggregator" : {
"type": "thetaSketch", "name": "orange", "fieldName": "person"
}
}
],
"postAggregations": [
{
"type": "thetaSketchEstimate",
"name": "apple_&_orange",
"field":
{
"type": "thetaSketchSetOp",
"name": "final_unique_users_sketch",
"func": "INTERSECT",
"fields": [
{
"type": "fieldAccess",
"fieldName": "apple"
},
{
"type": "fieldAccess",
"fieldName": "orange"
}
]
}
},
{
"type": "thetaSketchEstimate",
"name": "apple_&_not_orange",
"field":
{
"type": "thetaSketchSetOp",
"name": "final_unique_users_sketch",
"func": "NOT",
"fields": [
{
"type": "fieldAccess",
"fieldName": "apple"
},
{
"type": "fieldAccess",
"fieldName": "orange"
}
]
}
}
],
"intervals": [ "2018-06-30T23:00:05.000Z/2019-07-01T17:00:05.000Z" ]
}

感谢您的阅读。如果我需要提供任何其他信息,请告诉我。

最佳答案

如果您使用 != 运算符创建 NOT theta sketch 操作,似乎可以工作:

result = query.groupby(
datasource='datasource',
granularity='all',
intervals='2018-06-30/2018-08-30',
filter=(
(filters.Dimension('fruit') == 'apple') |
(filters.Dimension('fruit') == 'orange')
),
aggregations={
'apple': aggregators.filtered(
filters.Dimension('fruit') == 'apple',
aggregators.thetasketch('person')),
'orange': aggregators.filtered(
(filters.Dimension('fruit') == 'orange'),
aggregators.thetasketch('person')),
},
post_aggregations={
'apple_&_orange': postaggregator.ThetaSketchEstimate(
postaggregator.ThetaSketch('apple') &
postaggregator.ThetaSketch('orange')
),
'apple_&_not_orange': postaggregator.ThetaSketchEstimate(
postaggregator.ThetaSketch('apple') !=
postaggregator.ThetaSketch('orange')
)
}
)

(我通过深入研究 pydruid 源代码发现了这一点。)

关于python - 如何使用pydruid中的ThetaSketchOp函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56953155/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com