gpt4 book ai didi

regex - DFA 最小化

转载 作者:行者123 更新时间:2023-12-04 22:49:26 36 4
gpt4 key购买 nike

我有一个关于 DFA 最小化的问题。所以我使用了众所周知的技术将正则表达式转换为 NFA,然后使用 goto/closure 算法从中构造 DFA。现在的问题是如何将其最小化?我在这里看过有关它的课文:http://www.youtube.com/watch?v=T9Z66NF5YRk我仍然无法理解这一点。什么是 DFA 最小化?这只是合并 IDENTICAL 状态(在相同字符上进入相同状态的状态)还是不同的东西?

所以,我从以下语法开始:

%digit = '0'..'9'
%letter = 'a'..'z' | 'A'..'Z'
%exponent = ("e" | "E") ("+" | "-")? digit+

T_INT = digit+
T_FLOAT = T_INT exponent
T_IDENTIFIER = (letter | "$" | "_") (letter | "$" | "_" | digit)*

并最终得到以下 DFA(表示为 JSON):
{
"START": [{
"type": "range",
"from": 36,
"to": 36,
"shift": "1"
}, {
"type": "range",
"from": 48,
"to": 57,
"shift": "2"
}, {
"type": "range",
"from": 65,
"to": 90,
"shift": "1"
}, {
"type": "range",
"from": 95,
"to": 95,
"shift": "1"
}, {
"type": "range",
"from": 97,
"to": 122,
"shift": "1"
}],
"1": [{
"type": "range",
"from": 36,
"to": 36,
"shift": "1"
}, {
"type": "range",
"from": 48,
"to": 57,
"shift": "1"
}, {
"type": "range",
"from": 65,
"to": 90,
"shift": "1"
}, {
"type": "range",
"from": 95,
"to": 95,
"shift": "1"
}, {
"type": "range",
"from": 97,
"to": 122,
"shift": "1"
}, {
"shift": ["t_identifier"]
}],
"2": [{
"type": "range",
"from": 48,
"to": 57,
"shift": "2"
}, {
"type": "range",
"from": 69,
"to": 69,
"shift": "3"
}, {
"type": "range",
"from": 101,
"to": 101,
"shift": "3"
}, {
"shift": ["t_int"]
}],
"3": [{
"type": "range",
"from": 43,
"to": 43,
"shift": "5"
}, {
"type": "range",
"from": 45,
"to": 45,
"shift": "5"
}, {
"type": "range",
"from": 48,
"to": 57,
"shift": "4"
}],
"4": [{
"type": "range",
"from": 48,
"to": 57,
"shift": "4"
}, {
"shift": ["t_float"]
}],
"5": [{
"type": "range",
"from": 48,
"to": 57,
"shift": "4"
}]
}

那么我该如何最小化呢?

更新:

好的,这是我的算法。鉴于以下 DFA:
{
0: [{
from: 97,
to: 97,
shift: 1
}],
1: [{
from: 97,
to: 97,
shift: 3
}, {
from: 98,
to: 98,
shift: 2
}],
2: [{
from: 98,
to: 98,
shift: 4
}],
3: [{
from: 97,
to: 97,
shift: 3
}, {
from: 98,
to: 98,
shift: 4
}],
4: [{
from: 98,
to: 98,
shift: 4
}]
}

这就是我为最小化它所做的:
  • 对于每个状态(在我的示例中编号为 0、1、2、3、4)获得标识这种状态的唯一哈希(例如对于 state0,这将是:from=97,to=97,shift=1,对于 state1,这个将是:from=97,to=97,shift=3&from=98,to=98,shift=2 等等...)
  • 比较获得的散列,如果我们找到两个相同的散列,则将其合并。在我的例子中,state2 hash 将是:from=98&shift=4&to=98,state4 hash 将是:from=98&shift=4&to=98,它们是一样的,所以我们可以将它们合并到新的 state5 中,之后 DFA 将看起来像这样:
    {
    0: [{
    from: 97,
    to: 97,
    shift: 1
    }],
    1: [{
    from: 97,
    to: 97,
    shift: 3
    }, {
    from: 98,
    to: 98,
    shift: 5
    }],
    3: [{
    from: 97,
    to: 97,
    shift: 3
    }, {
    from: 98,
    to: 98,
    shift: 5
    }],
    5: [{
    from: 98,
    to: 98,
    shift: 5
    }]

    }
  • 继续这个直到我们没有变化,所以下一步(合并状态 1 和 3)将把 DFA 转换成以下形式:
    {
    0: [{
    from: 97,
    to: 97,
    shift: 6
    }],
    6: [{
    from: 97,
    to: 97,
    shift: 6
    }, {
    from: 98,
    to: 98,
    shift: 5
    }],
    5: [{
    from: 98,
    to: 98,
    shift: 5
    }]

    }
  • 没有更多相同的状态,这意味着我们已经完成了。

  • 第二次更新:

    好的,给定以下正则表达式: 'a' ('ce')* ('d' | 'xa' | 'AFe')+ | 'b' ('ce')* ('d' | 'xa' | 'AFe')+ 'ce'+ 我有以下 DFA (START -> start state, ["accept"] -> so to说转换到接受状态):
    {
    "START": [{
    "type": "range",
    "from": 98,
    "to": 98,
    "shift": "1.2"
    }, {
    "type": "range",
    "from": 97,
    "to": 97,
    "shift": "17.18"
    }],
    "1.2": [{
    "type": "range",
    "from": 120,
    "to": 120,
    "shift": "10"
    }, {
    "type": "range",
    "from": 100,
    "to": 100,
    "shift": "6.7"
    }, {
    "type": "range",
    "from": 65,
    "to": 65,
    "shift": "8"
    }, {
    "type": "range",
    "from": 99,
    "to": 99,
    "shift": "4"
    }],
    "10": [{
    "type": "range",
    "from": 97,
    "to": 97,
    "shift": "6.7"
    }],
    "6.7": [{
    "type": "range",
    "from": 99,
    "to": 99,
    "shift": "15"
    }, {
    "type": "range",
    "from": 120,
    "to": 120,
    "shift": "13"
    }, {
    "type": "range",
    "from": 100,
    "to": 100,
    "shift": "6.7"
    }, {
    "type": "range",
    "from": 65,
    "to": 65,
    "shift": "11"
    }],
    "15": [{
    "type": "range",
    "from": 101,
    "to": 101,
    "shift": "14.accept"
    }],
    "14.accept": [{
    "type": "range",
    "from": 99,
    "to": 99,
    "shift": "16"
    }, {
    "shift": ["accept"]
    }],
    "16": [{
    "type": "range",
    "from": 101,
    "to": 101,
    "shift": "14.accept"
    }],
    "13": [{
    "type": "range",
    "from": 97,
    "to": 97,
    "shift": "6.7"
    }],
    "11": [{
    "type": "range",
    "from": 70,
    "to": 70,
    "shift": "12"
    }],
    "12": [{
    "type": "range",
    "from": 101,
    "to": 101,
    "shift": "6.7"
    }],
    "8": [{
    "type": "range",
    "from": 70,
    "to": 70,
    "shift": "9"
    }],
    "9": [{
    "type": "range",
    "from": 101,
    "to": 101,
    "shift": "6.7"
    }],
    "4": [{
    "type": "range",
    "from": 101,
    "to": 101,
    "shift": "2.3"
    }],
    "2.3": [{
    "type": "range",
    "from": 120,
    "to": 120,
    "shift": "10"
    }, {
    "type": "range",
    "from": 100,
    "to": 100,
    "shift": "6.7"
    }, {
    "type": "range",
    "from": 65,
    "to": 65,
    "shift": "8"
    }, {
    "type": "range",
    "from": 99,
    "to": 99,
    "shift": "5"
    }],
    "5": [{
    "type": "range",
    "from": 101,
    "to": 101,
    "shift": "2.3"
    }],
    "17.18": [{
    "type": "range",
    "from": 120,
    "to": 120,
    "shift": "25"
    }, {
    "type": "range",
    "from": 100,
    "to": 100,
    "shift": "22.accept"
    }, {
    "type": "range",
    "from": 65,
    "to": 65,
    "shift": "23"
    }, {
    "type": "range",
    "from": 99,
    "to": 99,
    "shift": "20"
    }],
    "25": [{
    "type": "range",
    "from": 97,
    "to": 97,
    "shift": "22.accept"
    }],
    "22.accept": [{
    "type": "range",
    "from": 120,
    "to": 120,
    "shift": "28"
    }, {
    "type": "range",
    "from": 100,
    "to": 100,
    "shift": "22.accept"
    }, {
    "type": "range",
    "from": 65,
    "to": 65,
    "shift": "26"
    }, {
    "shift": ["accept"]
    }],
    "28": [{
    "type": "range",
    "from": 97,
    "to": 97,
    "shift": "22.accept"
    }],
    "26": [{
    "type": "range",
    "from": 70,
    "to": 70,
    "shift": "27"
    }],
    "27": [{
    "type": "range",
    "from": 101,
    "to": 101,
    "shift": "22.accept"
    }],
    "23": [{
    "type": "range",
    "from": 70,
    "to": 70,
    "shift": "24"
    }],
    "24": [{
    "type": "range",
    "from": 101,
    "to": 101,
    "shift": "22.accept"
    }],
    "20": [{
    "type": "range",
    "from": 101,
    "to": 101,
    "shift": "18.19"
    }],
    "18.19": [{
    "type": "range",
    "from": 120,
    "to": 120,
    "shift": "25"
    }, {
    "type": "range",
    "from": 100,
    "to": 100,
    "shift": "22.accept"
    }, {
    "type": "range",
    "from": 65,
    "to": 65,
    "shift": "23"
    }, {
    "type": "range",
    "from": 99,
    "to": 99,
    "shift": "21"
    }],
    "21": [{
    "type": "range",
    "from": 101,
    "to": 101,
    "shift": "18.19"
    }]
    }

    故事是一样的,我如何最小化它?如果我遵循经典的 Hopcroft 算法来构建所有这些表,确定不可区分的状态,将它们合并在一起等等,那么我将得到包含 15 个状态的 DFA(使用这个工具: http://regexvisualizer.apphb.com/ 和这个正则表达式 a(ce )(d|xa|AFe)+|b(ce)(d|xa|AFe)+ce+ 来检查)。以下是使用 Hopcroft 算法缩小后 DFA 的样子:

    Hopcroft's minimized DFA

    我想出的算法,在“重新思考”Hopcroft 的算法之后,构建的 DFA 比你在上面看到的要小(抱歉图像质量,我不得不一步一步地重新绘制它以了解为什么它更小):

    my algorithm

    这是它的工作原理,关于“状态等价”的决定基于给定状态(例如“START”)的哈希函数的结果,如果我们从该状态开始,则可以从 DFA 构建短字符串.给定上面的 DFA 和 START 状态,我们可以构造以下字符串: 98->120, 98->100, 98->65, 98->99, 97->120, 97->100, 97->65 , 97->99 所以让它成为 START 状态散列函数的结果。如果我们为 DFA 中的每个状态运行此函数,我们将看到对于某些状态,此函数为我们提供相同的结果(“1.2”、“6.7”、“2.3”和“10”、“13”和“15” , "16"AND "11", "8", "26", "23"AND "12", "9", "4", "5", "20", "21"AND "17.18", "18.19"AND "25", "28"AND "27", "24") 所以我们需要做的就是将这些状态合并在一起。

    我发现我在某处错了,但不明白我的算法生成的最小化 DFA 有什么问题?

    最佳答案

    您提出的算法没有完全最小化,因为它没有检测到行为相同的复杂结构。要了解此 DFA(由 JFLAP 绘制):

    enter image description here

    最小化将结合 q1 和 q2,但概述的算法无法做到。

    与此相反,Hopcroft 的算法最初会像这样进行分区:

       {q0, q1, q2}, {q3}

    然后拆分第一组,因为 q0 没有过渡到 q3:
       {q0}, {q1, q2}, {q3}

    并且不会进一步 split ,因为 q1 和 q2 的行为相同。

    关于regex - DFA 最小化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11132319/

    36 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com