gpt4 book ai didi

google-bigquery - 如何在 Google BigQuery 中执行三元运算?

转载 作者:行者123 更新时间:2023-12-03 23:23:29 25 4
gpt4 key购买 nike

我确实使用 pg_trgm PostgreSQL 中的模块,用于使用三元组计算两个字符串之间的相似度。特别是我使用:

similarity(text, text)

哪个返回返回一个数字,指示两个参数的相似程度(在 0 和 1 之间)。

如何在 Google BigQuery 上执行相似功能(或等效功能)?

最佳答案

下面试试。至少作为增强的蓝图

SELECT text1, text2, similarity FROM 
JS(
// input table
(
SELECT * FROM
(SELECT 'mikhail' AS text1, 'mikhail' AS text2),
(SELECT 'mikhail' AS text1, 'mike' AS text2),
(SELECT 'mikhail' AS text1, 'michael' AS text2),
(SELECT 'mikhail' AS text1, 'javier' AS text2),
(SELECT 'mikhail' AS text1, 'thomas' AS text2)
) ,
// input columns
text1, text2,
// output schema
"[{name: 'text1', type:'string'},
{name: 'text2', type:'string'},
{name: 'similarity', type:'float'}]
",
// function
"function(r, emit) {

var _extend = function(dst) {
var sources = Array.prototype.slice.call(arguments, 1);
for (var i=0; i<sources.length; ++i) {
var src = sources[i];
for (var p in src) {
if (src.hasOwnProperty(p)) dst[p] = src[p];
}
}
return dst;
};

var Levenshtein = {
/**
* Calculate levenshtein distance of the two strings.
*
* @param str1 String the first string.
* @param str2 String the second string.
* @return Integer the levenshtein distance (0 and above).
*/
get: function(str1, str2) {
// base cases
if (str1 === str2) return 0;
if (str1.length === 0) return str2.length;
if (str2.length === 0) return str1.length;

// two rows
var prevRow = new Array(str2.length + 1),
curCol, nextCol, i, j, tmp;

// initialise previous row
for (i=0; i<prevRow.length; ++i) {
prevRow[i] = i;
}

// calculate current row distance from previous row
for (i=0; i<str1.length; ++i) {
nextCol = i + 1;

for (j=0; j<str2.length; ++j) {
curCol = nextCol;

// substution
nextCol = prevRow[j] + ( (str1.charAt(i) === str2.charAt(j)) ? 0 : 1 );
// insertion
tmp = curCol + 1;
if (nextCol > tmp) {
nextCol = tmp;
}
// deletion
tmp = prevRow[j + 1] + 1;
if (nextCol > tmp) {
nextCol = tmp;
}

// copy current col value into previous (in preparation for next iteration)
prevRow[j] = curCol;
}

// copy last col value into previous (in preparation for next iteration)
prevRow[j] = nextCol;
}

return nextCol;
}

};

var the_text1;

try {
the_text1 = decodeURI(r.text1).toLowerCase();
} catch (ex) {
the_text1 = r.text1.toLowerCase();
}

try {
the_text2 = decodeURI(r.text2).toLowerCase();
} catch (ex) {
the_text2 = r.text2.toLowerCase();
}

emit({text1: the_text1, text2: the_text2,
similarity: 1 - Levenshtein.get(the_text1, the_text2) / the_text1.length});

}"
)
ORDER BY similarity DESC

这是基于 https://storage.googleapis.com/thomaspark-sandbox/udf-examples/pataky.js 的轻微修改通过@thomaspark

关于google-bigquery - 如何在 Google BigQuery 中执行三元运算?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34815207/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com