sqlite - 在移动设备上进行全文搜索？-6ren

sqlite - 在移动设备上进行全文搜索？

转载作者：行者123 更新时间：2023-12-03 16:15:10

我们很快就会着手开发新的移动应用程序。这个特定的应用程序将用于大量搜索基于文本的字段。对于哪种数据库引擎最适合在移动平台上进行这些类型的搜索，来自整个小组的任何建议？

具体包括 Windows Mobile 6，我们将使用 .Net CF。此外，一些基于文本的字段将在 35 到 500 个字符之间。该设备将以两种不同的方式运行，批处理和 WiFi。当然，对于 WiFi，我们可以将请求提交给成熟的数据库引擎，然后取回结果。这个问题围绕着“批处理”版本，该版本将包含一个数据库，其中包含有关设备闪存/可移动存储卡的信息。

无论如何，我知道 SQLCE 有一些基本的索引，但在您获得完整版本之前，您不会进入真正花哨的“全文”样式索引，当然在移动平台上不可用。

数据外观的示例:

“围裙木匠可调节皮革容器口袋腰部五金腰带”等。

我还没有开始评估任何其他特定选项，因为我想我会利用这个小组的经验来首先向我指出一些特定的途径。

任何建议/提示？

最佳答案

就在最近，我遇到了同样的问题。这是我所做的:

我创建了一个类来保存每个对象的 id 和文本(在我的例子中，我称之为 sku(项目编号)和描述)。这将创建一个使用较少内存的较小对象，因为它仅用于搜索。找到匹配项后，我仍会从数据库中获取完整的对象。

public class SmallItem
{
    private int _sku;
    public int Sku
    {
        get { return _sku; }
        set { _sku = value; }
    }

    // Size of max description size + 1 for null terminator.
    private char[] _description = new char[36];
    public char[] Description
    {
        get { return _description; }
        set { _description = value; }
    }

    public SmallItem()
    {
    }
}

创建此类后，您可以创建这些对象的数组(在本例中我实际上使用了 List)，并使用它在整个应用程序中进行搜索。这个列表的初始化需要一些时间，但你只需要在启动时担心这个。基本上只需在您的数据库上运行查询并获取创建此列表所需的数据。

有了列表后，您可以快速浏览它，搜索您想要的任何单词。由于它是一个包含，它还必须在单词中查找单词(例如，drill 会返回drill、drillbit、drills 等)。为此，我们编写了一个自主开发的非托管 c# contains 函数。它接受一个单词字符串数组(因此您可以搜索多个单词……我们将它用于“AND”搜索……描述必须包含传入的所有单词……当前不支持“OR”在这个例子中)。当它搜索单词列表时，它会构建一个 ID 列表，然后将其传递回调用函数。获得 ID 列表后，您可以轻松地在数据库中运行快速查询，以根据快速索引的 ID 号返回完整的对象。我应该提到，我们还限制了返回结果的最大数量。这个可以取出来。如果有人输入诸如“e”之类的内容作为他们的搜索词，这会很方便。这将返回很多结果。

这是自定义包含函数的示例:

public static int[] Contains(string[] descriptionTerms, int maxResults, List<SmallItem> itemList)
{
    // Don't allow more than the maximum allowable results constant.            
    int[] matchingSkus = new int[maxResults];

    // Indexes and counters.
    int matchNumber = 0;
    int currentWord = 0;
    int totalWords = descriptionTerms.Count() - 1;  // - 1 because it will be used with 0 based array indexes

    bool matchedWord;

    try
    {   
        /* Character array of character arrays. Each array is a word we want to match.
         * We need the + 1 because totalWords had - 1 (We are setting a size/length here,
         * so it is not 0 based... we used - 1 on totalWords because it is used for 0
         * based index referencing.)
         * */
        char[][] allWordsToMatch = new char[totalWords + 1][];

        // Character array to hold the current word to match. 
        char[] wordToMatch = new char[36]; // Max allowable word size + null terminator... I just picked 36 to be consistent with max description size.

        // Loop through the original string array or words to match and create the character arrays. 
        for (currentWord = 0; currentWord <= totalWords; currentWord++)
        {
            char[] desc = new char[descriptionTerms[currentWord].Length + 1];
            Array.Copy(descriptionTerms[currentWord].ToUpper().ToCharArray(), desc, descriptionTerms[currentWord].Length);
            allWordsToMatch[currentWord] = desc;
        }

        // Offsets for description and filter(word to match) pointers.
        int descriptionOffset = 0, filterOffset = 0;

        // Loop through the list of items trying to find matching words.
        foreach (SmallItem i in itemList)
        {
            // If we have reached our maximum allowable matches, we should stop searching and just return the results.
            if (matchNumber == maxResults)
                break;

            // Loop through the "words to match" filter list.
            for (currentWord = 0; currentWord <= totalWords; currentWord++)
            {
                // Reset our match flag and current word to match.
                matchedWord = false;
                wordToMatch = allWordsToMatch[currentWord];

                // Delving into unmanaged code for SCREAMING performance ;)
                unsafe
                {
                    // Pointer to the description of the current item on the list (starting at first char).
                    fixed (char* pdesc = &i.Description[0])
                    {
                        // Pointer to the current word we are trying to match (starting at first char).
                        fixed (char* pfilter = &wordToMatch[0])
                        {
                            // Reset the description offset.
                            descriptionOffset = 0;

                            // Continue our search on the current word until we hit a null terminator for the char array.
                            while (*(pdesc + descriptionOffset) != '\0')
                            {
                                // We've matched the first character of the word we're trying to match.
                                if (*(pdesc + descriptionOffset) == *pfilter)
                                {
                                    // Reset the filter offset.
                                            filterOffset = 0;

                                    /* Keep moving the offsets together while we have consecutive character matches. Once we hit a non-match
                                     * or a null terminator, we need to jump out of this loop.
                                     * */
                                    while (*(pfilter + filterOffset) != '\0' && *(pfilter + filterOffset) == *(pdesc + descriptionOffset))
                                    {
                                        // Increase the offsets together to the next character.
                                        ++filterOffset;
                                        ++descriptionOffset;
                                    }

                                    // We hit matches all the way to the null terminator. The entire word was a match.
                                    if (*(pfilter + filterOffset) == '\0')
                                    {
                                        // If our current word matched is the last word on the match list, we have matched all words.
                                        if (currentWord == totalWords)
                                        {
                                            // Add the sku as a match.
                                            matchingSkus[matchNumber] = i.Sku.ToString();
                                            matchNumber++;

                                            /* Break out of this item description. We have matched all needed words and can move to
                                             * the next item.
                                             * */
                                            break;
                                        }

                                        /* We've matched a word, but still have more words left in our list of words to match.
                                         * Set our match flag to true, which will mean we continue continue to search for the
                                         * next word on the list.
                                         * */
                                         matchedWord = true;
                                    }
                                }

                                // No match on the current character. Move to next one.
                                descriptionOffset++;
                            }

                            /* The current word had no match, so no sense in looking for the rest of the words. Break to the
                             * next item description.
                             * */
                             if (!matchedWord)
                                break;
                        }
                    }
                }
            }
        };

        // We have our list of matching skus. We'll resize the array and pass it back.
        Array.Resize(ref matchingSkus, matchNumber);
        return matchingSkus;
    }
    catch (Exception ex)
    {
        // Handle the exception
    }
}

获得匹配的 skus 列表后，您可以遍历数组并构建仅返回匹配 skus 的查询命令。

对于性能的想法，这是我们发现的(执行以下步骤):

搜索 ~171,000 项

创建所有匹配项目的列表

查询数据库，只返回匹配项

构建成熟的项目(类似于 SmallItem 类，但有更多的字段)

使用完整的项目对象填充数据网格。

在我们的移动设备上，整个过程需要 2-4 秒(如果我们在搜索所有项目之前达到匹配限制，则需要 2 秒……如果我们必须扫描每个项目，则需要 4 秒)。

我也试过在没有非托管代码的情况下使用 String.IndexOf (并尝试过 String.Contains ... 与 IndexOf 具有相同的性能)。这种方式要慢得多......大约25秒。

我还尝试使用 StreamReader 和包含 [Sku Number]|[Description] 行的文件。该代码类似于非托管代码示例。这种方式整个扫描大约需要 15 秒。速度不算太差，但不是很好。与我向您展示的方式相比，文件和 StreamReader 方法有一个优势。该文件可以提前创建。我向您展示的方式需要内存和应用程序启动时加载列表的初始时间。对于我们的 171,000 个项目，这大约需要 2 分钟。如果您有能力在每次应用程序启动时等待初始加载(当然可以在单独的线程上完成)，那么以这种方式搜索是最快的方式(至少我已经找到了)。

希望有帮助。

PS - 感谢 Dolch 帮助处理一些非托管代码。

关于sqlite - 在移动设备上进行全文搜索？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/276489/

文章推荐： sqlite - Docker:如何将sqlite数据库更改持久保存到db文件？

文章推荐： sql - 在sqlite3中使用row_number时出现语法错误

文章推荐： sqlite - 准备好的语句和 IN 表达式

文章推荐： image - 远程图像文件到 PHP 中的 sqlite blob？

svn - 搜索颠覆历史(全文)
有没有办法对 Subversion 存储库执行全文搜索，包括所有历史记录？例如，我编写了一个在某处使用过的功能，但后来不需要它，所以我对文件进行了 svn rm'd，但现在我需要再次找到它以将其用于
MySQL - 通过部分单词匹配和相关性评分进行高效搜索(全文)
如何进行 MySQL 搜索，既匹配部分单词，又提供准确的相关性排序？ SELECT name, MATCH(name) AGAINST ('math*' IN BOOLEAN MODE) AS rel
postgresql - 全文 Postgres
我在 postgresql 中创建了一个用于全文搜索的索引。 CREATE INDEX pesquisa_idx ON chamado USING gin(to_tsvector('portugues
Mysql 未对结果进行排名，全文
我已经设置了一个数据库并启用了全文搜索，当我使用以下内容搜索数据库时，数据库中有一些条目包含“测试”一词，还有一个条目包含“测试更多”: SELECT keywords, title FROM dat
具有特定词序的 MySQL 全文
我想知道是否可以进行 MATCH() AGAINST()(全文)搜索，使得不直接相邻的单词需要按特定顺序排列？在我的网站上，当用户在双引号之间键入单词时，搜索将仅显示具有特定顺序的这些单词的结果。例如
MYSQL 全文 - 意外结果
我有一个 80,000 行的数据库，当我测试一些 FULLTEXT 查询时，我遇到了一个意想不到的结果。我已从 MYSQL 中删除停用词并将最小字长设置为 3。当我执行此查询时: SELECT `s
MySQL - 查找部分字符串 - 全文？
我刚刚在我的 MYSQL 数据库中发现了一堆流氓数据... 到达它的唯一方法是通过其中一列 - FILE_PATH，其中包含文件路径的斜杠剥离版本。我需要在这组文件中找到一些恶意文件——它们的文件名都
带词干的 MySQL 全文
我正在为我的站点构建一个小的搜索功能。我正在接受用户的查询，提取关键字，然后针对提取的关键字运行全文 MySQL 搜索。问题在于 MySQL 将词干视为文字。这是正在发生的过程: 用户搜索“棒球”之
database - (全文)搜索与数据库设计
这是一个关于使用(关系)数据库设计全文搜索的系统架构问题。我使用的具体软件是 Solr 和 PostgreSQL，仅供引用。假设我们正在构建一个有两个用户 Andy 和 Betty 的论坛 -- P
元素数组中的数组上的 MongoDB 全文
当元素数组中的数组包含应与我的搜索匹配的文本时，我无法检索文档。这里有两个示例文档: { _id: ..., 'foo': [ { 'name
mysql - 全文 : this query very slow
我正在使用这个查询，但不幸的是它运行缓慢: SELECT *, (MATCH(`title`) AGAINST ('$word' IN BOOLEAN MODE) * 2 + MATC
php - Mysql(全文？)搜索产品
我正在构建一个非常简单的产品目录，它将在 mysql 表中存储产品，我想尽快搜索产品(并尽可能相关)。产品数据库将非常大(大约 500.000 个产品)，这就是为什么使用“like”而不使用索引的搜索
Mysql 全文、匹配...和搜索字段中的@
select count(distinct email_address) from users WHERE MATCH (email_address) AGAINST ('@r
MySQL 全文 MATCH AGAINST 不适用于复数
我正在尝试在 mySQL 中进行简单的全文搜索，但在复数方面遇到一些问题。我确实相信我符合50% 规则。我不认为我使用了停用词。我正在运行这样的查询: SELECT * FROM product
mysql - 全文 InnoDB 搜索没有响应
我在 innoDB 数据库中使用全文搜索时遇到了一个大问题。首先，ns_pages 表有超过 2.6m 的记录，全文索引有 3 个键 block 。该数据库在具有 128GB RAM 的 Dell
MySQL 全文 : have a result weigh more
我有一个城市和州的数据库(大约 43,000 个)。我对其进行全文搜索，如下所示: select city, state, match(city, state_short, state) agains
Mysql 全文 50% 阈值
我正在使用带有自然语言全文的 Mysql FULLTEXT 搜索，不幸的是，我遇到了 FULLTEXT 50% 阈值，如果给定的关键字出现在总行数的 50% 时间，则不允许我搜索行。我搜索并找到了一
mysql - 全文 mysql 搜索不起作用
如果我搜索单词hello，那么我没有匹配到，而我搜索单词hella，那么我得到了匹配。同样的情况也发生在“Non”这个词上。我在 Mac 上的 MAMP 和 sqlfiddle.com 上进行了测试，
Postgresql 全文(pg_trgm)更好地处理精确匹配？
所以我有一个简单的场景。我有一张 field 表(事件 field 等)。我的查询看起来像: SELECT * FROM venues WHERE venues.name % 'Philips Are
MySQL 全文(非)搜索
我有一个表，其中有视频数据，如“标题”、“描述”等。我正在尝试使用 MySQL 全文索引编写一个搜索引擎。 SQL 查询适用于某些单词，但不是每个单词。这是我的 SQL 查询； SELECT * FR

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

sqlite - 在移动设备上进行全文搜索？