gpt4 book ai didi

php - 将 Google 搜索查询转换为 PostgreSQL "tsquery"

转载 作者:可可西里 更新时间:2023-10-31 23:26:42 26 4
gpt4 key购买 nike

如何将 Google 搜索查询转换为可以提供给 PostgreSQL 的 to_tsquery() 的内容?

如果没有现成的库,我应该如何使用 PHP 等语言解析 Google 搜索查询?

例如,我想采用以下类似 Google 的搜索查询:

("used cars" OR "new cars") -ford -mistubishi

并将其转换为对 to_tsquery() 友好的字符串:

('used cars' | 'new cars') & !ford & !mistubishi

我可以用正则表达式捏造这个,但这是我能做的最好的。是否有一些强大的词法分析方法可以解决这个问题?我也希望能够支持扩展搜索运算符(如 Google 的 site: 和 intitle:),它们将应用于不同的数据库字段,因此需要与 tsquery 字符串分开。

更新:我意识到使用特殊运算符,这变成了 Google 到 SQL WHERE 子句的转换,而不是 Google 到 tsquery 的转换。但是 WHERE 子句可能包含一个或多个 tsqueries。

例如,Google 风格的查询:

((color:blue OR "4x4") OR style:coupe) -color:red used

应该产生这样的 SQL WHERE 子句:

WHERE to_tsvector(description) MATCH to_tsquery('used')
AND color <> 'red'
AND ( (color = 'blue' OR to_tsvector(description) MATCH to_tsquery('4x4') )
OR style = 'coupe'
);

我不确定以上是否可以用正则表达式实现?

最佳答案

老实说,我认为正则表达式是解决此类问题的方法。同样,这是一个有趣的练习。下面的代码非常原型(prototype)——事实上,你会看到我什至没有实现词法分析器本身——我只是伪造了输出。我想继续,但我今天没有更多的空闲时间。

此外,在支持其他类型的搜索运算符等方面肯定还有很多工作要做。

基本上,这个想法是对某种类型的查询进行词法分析,然后将其解析为一种通用格式(在本例中为 QueryExpression 实例),然后将其呈现为另一种类型的查询。

<?php

ini_set( "display_errors", "on" );
error_reporting( E_ALL );

interface ILexer
{
public function execute( $str );
public function getTokens();
}

interface IParser
{
public function __construct( iLexer $lexer );
public function parse( $input );
public function addToken( $token );
}

class GoogleQueryLexer implements ILexer
{
private $tokenStack = array();

public function execute( $str )
{
$chars = str_split( $str );
foreach ( $chars as $char )
{
// add to self::$tokenStack per your rules
}

//'("used cars" OR "new cars") -ford -mistubishi'
$this->tokenStack = array(
'('
, 'used cars'
, 'or new cars'
, ')'
, '-ford'
, '-mitsubishi'
);
}

public function getTokens()
{
return $this->tokenStack;
}
}

class GoogleQueryParser implements IParser
{
protected $lexer;

public function __construct( iLexer $lexer )
{
$this->lexer = $lexer;
}

public function addToken( $token )
{
$this->tokenStack[] = $token;
}

public function parse( $input )
{
$this->lexer->execute( $input );
$tokens = $this->lexer->getTokens();

$expression = new QueryExpression();

foreach ( $tokens as $token )
{
$expression = $this->processToken( $token, $expression );
}

return $expression;
}

protected function processToken( $token, QueryExpression $expression )
{
switch ( $token )
{
case '(':
return $expression->initiateSubExpression();
break;
case ')':
return $expression->getParentExpression();
break;
default:
$modifier = $token[0];
$phrase = substr( $token, 1 );
switch ( $modifier )
{
case '-':
$expression->addExclusionPhrase( $phrase );
break;
case '+':
$expression->addPhrase( $phrase );
break;
default:
$operator = trim( substr( $token, 0, strpos( $token, ' ' ) ) );
$phrase = trim( substr( $token, strpos( $token, ' ' ) ) );
switch ( strtolower( $operator ) )
{
case 'and':
$expression->addAndPhrase( $phrase );
break;
case 'or':
$expression->addOrPhrase( $phrase );
break;
default:
$expression->addPhrase( $token );
}
}
}
return $expression;
}
}

class QueryExpression
{
protected $phrases = array();
protected $subExpressions = array();
protected $parent;

public function __construct( $parent=null )
{
$this->parent = $parent;
}

public function initiateSubExpression()
{
$expression = new self( $this );
$this->subExpressions[] = $expression;
return $expression;
}

public function getPhrases()
{
return $this->phrases;
}

public function getSubExpressions()
{
return $this->subExpressions;
}

public function getParentExpression()
{
return $this->parent;
}

protected function addQueryPhrase( QueryPhrase $phrase )
{
$this->phrases[] = $phrase;
}

public function addPhrase( $input )
{
$this->addQueryPhrase( new QueryPhrase( $input ) );
}

public function addOrPhrase( $input )
{
$this->addQueryPhrase( new QueryPhrase( $input, QueryPhrase::MODE_OR ) );
}

public function addAndPhrase( $input )
{
$this->addQueryPhrase( new QueryPhrase( $input, QueryPhrase::MODE_AND ) );
}

public function addExclusionPhrase( $input )
{
$this->addQueryPhrase( new QueryPhrase( $input, QueryPhrase::MODE_EXCLUDE ) );
}
}

class QueryPhrase
{
const MODE_DEFAULT = 1;
const MODE_OR = 2;
const MODE_AND = 3;
const MODE_EXCLUDE = 4;

protected $phrase;
protected $mode;

public function __construct( $input, $mode=self::MODE_DEFAULT )
{
$this->phrase = $input;
$this->mode = $mode;
}

public function getMode()
{
return $this->mode;
}

public function __toString()
{
return $this->phrase;
}
}

class TsqueryBuilder
{
protected $expression;
protected $query;

public function __construct( QueryExpression $expression )
{
$this->query = trim( $this->processExpression( $expression ), ' &|' );
}

public function getResult()
{
return $this->query;
}

protected function processExpression( QueryExpression $expression )
{
$query = '';
$phrases = $expression->getPhrases();
$subExpressions = $expression->getSubExpressions();

foreach ( $phrases as $phrase )
{
$format = "'%s' ";
switch ( $phrase->getMode() )
{
case QueryPhrase::MODE_AND :
$format = "& '%s' ";
break;
case QueryPhrase::MODE_OR :
$format = "| '%s' ";
break;
case QueryPhrase::MODE_EXCLUDE :
$format = "& !'%s' ";
break;
}
$query .= sprintf( $format, str_replace( "'", "\\'", $phrase ) );
}

foreach ( $subExpressions as $subExpression )
{
$query .= "& (" . $this->processExpression( $subExpression ) . ")";
}
return $query;
}
}

$parser = new GoogleQueryParser( new GoogleQueryLexer() );

$queryBuilder = new TsqueryBuilder( $parser->parse( '("used cars" OR "new cars") -ford -mistubishi' ) );

echo $queryBuilder->getResult();

关于php - 将 Google 搜索查询转换为 PostgreSQL "tsquery",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/207817/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com