gpt4 book ai didi

sql - 从SQL中的一列字符串中获取最常用的单词

转载 作者:行者123 更新时间:2023-12-04 13:15:47 25 4
gpt4 key购买 nike

因此,我们在this database中填充了一堆字符串,在这种情况下为帖子标题。
我想做的是:

  • 将字符串分成单词
  • 计算单词在字符串
  • 中出现的次数
  • 给我50个以内的单词
  • data.se查询
  • 中没有此超时

    我尝试使用来自 this SO question的适合data.se的信息,如下所示:
    select word, count(*) from (
    select (case when instr(substr(p.Title, nums.n+1), ' ') then substr(p.Title, nums.n+1)
    else substr(p.Title, nums.n+1, instr(substr(p.Title, nums.n+1), ' ') - 1)
    end) as word
    from (select ' '||Title as string
    from Posts p
    )Posts cross join
    (select 1 as n union all select 2 union all select 10
    ) nums
    where substr(p.Title, nums.n, 1) = ' ' and substr(p.Title, nums.n, 1) <> ' '
    ) w
    group by word
    order by count(*) desc
    不幸的是,这给了我很多错误:

    'substr' is not a recognized built-in function name. Incorrect syntaxnear '|'. Incorrect syntax near 'nums'.


    因此,给定SQL中的一列字符串,每个字符串中的文本量可变,我如何获得最常用的X个单词的列表?

    最佳答案

    正如Blogbeard所说,您提供的查询不适用于SQL Server。这是计算最常用单词的一种方法。这基于由Jeff Moden编写并由SQL Server Central社区成员改进的DelimitedSplitN4K函数。

    ONLINE DEMO

    WITH E1(N) AS (
    SELECT 1 FROM (VALUES
    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ) t(N)
    ),
    E2(N) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b),
    E4(N) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b)
    SELECT TOP 50
    x.Item,
    COUNT(*)
    FROM Posts p
    CROSS APPLY (
    SELECT
    ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
    Item = LTRIM(RTRIM(SUBSTRING(p.Title, l.N1, l.L1)))
    FROM (
    SELECT s.N1,
    L1 = ISNULL(NULLIF(CHARINDEX(' ',p.Title,s.N1),0)-s.N1,4000)
    FROM(
    SELECT 1 UNION ALL
    SELECT t.N+1
    FROM(
    SELECT TOP (ISNULL(DATALENGTH(p.Title)/2,0))
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM E4
    ) t(N)
    WHERE SUBSTRING(p.Title ,t.N,1) = ' '
    ) s(N1)
    ) l(N1, L1)
    ) x
    WHERE x.item <> ''
    GROUP BY x.Item
    ORDER BY COUNT(*) DESC

    由于不允许创建函数,所以我已经这样写了。如果您对此感兴趣,请参见下面的函数定义:
    CREATE FUNCTION [dbo].[DelimitedSplitN4K](
    @pString NVARCHAR(4000),
    @pDelimiter NCHAR(1)
    )
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN

    WITH E1(N) AS (
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
    ),
    E2(N) AS (SELECT 1 FROM E1 a, E1 b),
    E4(N) AS (SELECT 1 FROM E2 a, E2 b),
    cteTally(N) AS(
    SELECT TOP (ISNULL(DATALENGTH(@pString)/2,0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
    ),
    cteStart(N1) AS (
    SELECT 1 UNION ALL
    SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
    ),
    cteLen(N1,L1) AS(
    SELECT s.N1,
    ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,4000)
    FROM cteStart s
    )
    SELECT
    ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
    Item = SUBSTRING(@pString, l.N1, l.L1)
    FROM cteLen l
    ;

    这是您将如何使用它:
    SELECT TOP 50
    x.Item,
    COUNT(*)
    FROM Posts p
    CROSS APPLY dbo.DelimitedSplitN4K(p.Title, ' ') x
    WHERE LTRIM(RTRIM(x.Item)) <> ''
    GROUP BY x.Item
    ORDER BY COUNT(*) DESC

    结果:
    Item             
    -------- -------
    to 3812411
    in 3331522
    a 2543636
    How 1770915
    the 1534298
    with 1341632
    of 1297468
    and 1166664
    on 970554
    from 964449
    for 886007
    not 835979
    is 704724
    using 703007
    I 633838
    - 632441
    an 548450
    when 449169
    file 409717
    how 358745
    data 335271
    do 323854
    can 310298
    get 305922
    or 266317
    error 263563
    use 258408
    value 254392
    it 251254
    my 238902
    function 235832
    by 231025
    Android 228308
    as 216654
    array 209157
    working 207445
    does 207274
    Is 205613
    multiple 203336
    that 197826
    Why 196979
    into 196591
    after 192056
    string 189053
    PHP 187018
    one 182360
    class 179965
    if 179590
    text 174878
    table 169393

    关于sql - 从SQL中的一列字符串中获取最常用的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37450020/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com