gpt4 book ai didi

SQL-jaccard 相似度

转载 作者:行者123 更新时间:2023-12-04 15:30:32 24 4
gpt4 key购买 nike

我的表如下所示:

author | group 

daniel | group1,group2,group3,group4,group5,group8,group10
adam | group2,group5,group11,group12
harry | group1,group10,group15,group13,group15,group18
...
...

我希望我的输出看起来像:
author1 | author2 | intersection | union

daniel | adam | 2 | 9
daniel | harry| 2 | 11
adam | harry| 0 | 10

谢谢你

最佳答案

尝试以下(对于 BigQuery)

SELECT
a.author AS author1,
b.author AS author2,
SUM(a.item=b.item) AS intersection,
EXACT_COUNT_DISTINCT(a.item) + EXACT_COUNT_DISTINCT(b.item) - intersection AS [union]
FROM FLATTEN((
SELECT author, SPLIT([group]) AS item FROM YourTable
), item) AS a
CROSS JOIN FLATTEN((
SELECT author, SPLIT([group]) AS item FROM YourTable
), item) AS b
WHERE a.author < b.author
GROUP BY 1,2

Added solution for BigQuery Standard SQL


WITH YourTable AS (
SELECT 'daniel' AS author, 'group1,group2,group3,group4,group5,group8,group10' AS grp UNION ALL
SELECT 'adam' AS author, 'group2,group5,group11,group12' AS grp UNION ALL
SELECT 'harry' AS author, 'group1,group10,group13,group15,group18' AS grp
),
tempTable AS (
SELECT author, SPLIT(grp) AS grp
FROM YourTable
)
SELECT
a.author AS author1,
b.author AS author2,
(SELECT COUNT(1) FROM a.grp) AS count1,
(SELECT COUNT(1) FROM b.grp) AS count2,
(SELECT COUNT(1) FROM UNNEST(a.grp) AS agrp JOIN UNNEST(b.grp) AS bgrp ON agrp = bgrp) AS intersection_count,
(SELECT COUNT(1) FROM (SELECT * FROM UNNEST(a.grp) UNION DISTINCT SELECT * FROM UNNEST(b.grp))) AS union_count
FROM tempTable a
JOIN tempTable b
ON a.author < b.author

我喜欢这个:
  • 更简单/更友好的代码
  • 不需要 CROSS JOIN 和额外的 GROUP BY

  • 何时/如果尝试 - 确保取消选中 Use Legacy SQL显示选项下的复选框

    关于SQL-jaccard 相似度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36705160/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com