python - 获得一种语法来阅读文本中的多个关键字-6ren

python - 获得一种语法来阅读文本中的多个关键字

转载作者：太空宇宙更新时间：2023-11-04 03:56:41

我仍然认为自己是 pyparsing 的新手。我拼凑了 2 个快速语法，但都没有成功地完成我想做的事情。我正在尝试提出一种看起来非常简单的语法，但事实证明(至少对我而言)并不是那么微不足道。该语言有一个基本定义。它按关键字和正文分割。正文可以跨越多行。关键字位于前 20 个字符左右的行首，但以“;”结尾(没有引号)。所以我拼凑了一个快速演示程序，这样我就可以用几个语法进行测试。但是，当我尝试使用它们时，它们总是获得第一个关键字，但之后没有。

我附上了源代码作为示例和正在发生的输出。尽管这只是测试代码，但出于习惯我做了文档。在下面的示例中，两个关键字是 NOW；最后;理想情况下，我不希望关键字中包含分号。

我应该怎么做才能使这项工作有任何想法？

from pyparsing import *

def testString(text,grammar):
    """
    @summary: perform a test of a grammar
    2type text: text
    @param text: text buffer for input (a message to be parsed)
    @type grammar: MatchFirst or equivalent pyparsing construct
    @param grammar: some grammar defined somewhere else
    @type pgm: text
    @param pgm: typically name of the program, which invoked this function.
    @status: 20130802 CODED
    """
    print 'Input Text is %s' % text
    print 'Grammar is %s' % grammar
    tokens = grammar.parseString(text)
    print 'After parse string: %s' % tokens
    tokens.dump()
    tokens.keys()

    return tokens


def getText(msgIndex):
    """
    @summary: make a text string suitable for parsing
    @returns: returns a text buffer
    @type msgIndex: int
    @param msgIndex: a number corresponding to a text buffer to retrieve
    @status: 20130802 CODED
    """

    msg = [  """NOW; is the time for a few good ones to come to the aid
of new things to come for it is almost time for
a tornado to strike upon a small hill
when least expected.
lastly; another day progresses and
then we find that which we seek
and finally we will
find our happiness perhaps its closer than 1 or 2 years or not so
    """,
         '',
      ]

    return msg[msgIndex]

def getGrammar(grammarIndex):
    """
    @summary: make a grammar given an index
    @type: grammarIndex: int
    @param grammarIndex: a number corresponding to the grammar to be retrieved
    @Note: a good run will return 2 keys: NOW: and lastly:  and each key will have an associated body. The body is all
    words and text up to the next keyword or eof which ever is first.
    """
    kw = Combine(Word(alphas + nums) + Literal(';'))('KEY')
    kw.setDebug(True)
    body1 = delimitedList(OneOrMore(Word(alphas + nums)) +~kw)('Body')
    body1.setDebug(True)
    g1 = OneOrMore(Group(kw + body1))

    # ok start defining a new grammar (borrow kw from grammar).

    body2 = SkipTo(~kw, include=False)('BODY')
    body2.setDebug(True)

    g2 = OneOrMore(Group(kw+body2))
    grammar = [g1,
           g2,
          ]
    return grammar[grammarIndex]


if __name__ == '__main__':
    # list indices [ text, grammar ]
    tests = {1: [0,0],
         2: [0,1],
        }
    check = tests.keys()
    check.sort()
    for testno in check:
    print 'STARTING Test %d' % testno
    text = getText(tests[testno][0])
    grammar = getGrammar(tests[testno][1])
    tokens = testString(text, grammar)
    print 'Tokens found %s' % tokens
    print 'ENDING Test %d' % testno

输出如下所示:(使用 python 2.7 和 pyparsing 2.0.1)

    STARTING Test 1
    Input Text is NOW; is the time for a few good ones to come to the aid
    of new things to come for it is almost time for
    a tornado to strike upon a small hill
    when least expected.
    lastly; another day progresses and
    then we find that which we seek
    and finally we will
    find our happiness perhaps its closer than 1 or 2 years or not so

    Grammar is {Group:({Combine:({W:(abcd...) ";"}) {{W:(abcd...)}... ~{Combine:({W:(abcd...) ";"})}} [, {{W:(abcd...)}... ~{Combine:({W:(abcd...) ";"})}}]...})}...
    Match Combine:({W:(abcd...) ";"}) at loc 0(1,1)
    Matched Combine:({W:(abcd...) ";"}) -> ['NOW;']
    Match {{W:(abcd...)}... ~{Combine:({W:(abcd...) ";"})}} [, {{W:(abcd...)}... ~{Combine:({W:(abcd...) ";"})}}]... at loc 4(1,5)
    Match Combine:({W:(abcd...) ";"}) at loc 161(4,20)
    Exception raised:Expected W:(abcd...) (at char 161), (line:4, col:20)
    Matched {{W:(abcd...)}... ~{Combine:({W:(abcd...) ";"})}} [, {{W:(abcd...)}... ~{Combine:({W:(abcd...) ";"})}}]... -> ['is', 'the', 'time', 'for', 'a', 'few', 'good', 'ones', 'to', 'come', 'to', 'the', 'aid', 'of', 'new', 'things', 'to', 'come', 'for', 'it', 'is', 'almost', 'time', 'for', 'a', 'tornado', 'to', 'strike', 'upon', 'a', 'small', 'hill', 'when', 'least', 'expected']
    Match Combine:({W:(abcd...) ";"}) at loc 161(4,20)
    Exception raised:Expected W:(abcd...) (at char 161), (line:4, col:20)
    After parse string: [['NOW;', 'is', 'the', 'time', 'for', 'a', 'few', 'good', 'ones', 'to', 'come', 'to', 'the', 'aid', 'of', 'new', 'things', 'to', 'come', 'for', 'it', 'is', 'almost', 'time', 'for', 'a', 'tornado', 'to', 'strike', 'upon', 'a', 'small', 'hill', 'when', 'least', 'expected']]
    Tokens found [['NOW;', 'is', 'the', 'time', 'for', 'a', 'few', 'good', 'ones', 'to', 'come', 'to', 'the', 'aid', 'of', 'new', 'things', 'to', 'come', 'for', 'it', 'is', 'almost', 'time', 'for', 'a', 'tornado', 'to', 'strike', 'upon', 'a', 'small', 'hill', 'when', 'least', 'expected']]
    ENDING Test 1
    STARTING Test 2
    Input Text is NOW; is the time for a few good ones to come to the aid
    of new things to come for it is almost time for
    a tornado to strike upon a small hill
    when least expected.
    lastly; another day progresses and
    then we find that which we seek
    and finally we will
    find our happiness perhaps its closer than 1 or 2 years or not so

    Grammar is {Group:({Combine:({W:(abcd...) ";"}) SkipTo:(~{Combine:({W:(abcd...) ";"})})})}...
    Match Combine:({W:(abcd...) ";"}) at loc 0(1,1)
    Matched Combine:({W:(abcd...) ";"}) -> ['NOW;']
    Match SkipTo:(~{Combine:({W:(abcd...) ";"})}) at loc 4(1,5)
    Match Combine:({W:(abcd...) ";"}) at loc 4(1,5)
    Exception raised:Expected ";" (at char 7), (line:1, col:8)
    Matched SkipTo:(~{Combine:({W:(abcd...) ";"})}) -> ['']
    Match Combine:({W:(abcd...) ";"}) at loc 5(1,6)
    Exception raised:Expected ";" (at char 7), (line:1, col:8)
    After parse string: [['NOW;', '']]
    Tokens found [['NOW;', '']]
    ENDING Test 2

    Process finished with exit code 0

最佳答案

我对 TDD 很在行，但在这里，您的整个测试和备选方案选择基础架构确实阻碍了查看语法的位置以及正在发生的事情。如果我去掉所有额外的机制，我看到你的语法只是:

kw = Combine(Word(alphas + nums) + Literal(';'))('KEY')
body1 = delimitedList(OneOrMore(Word(alphas + nums)) +~kw)('Body')
g1 = OneOrMore(Group(kw + body1))

我看到的第一个问题是你对 body1 的定义:

body1 = delimitedList(OneOrMore(Word(alphas + nums)) +~kw)('Body')

你在正确的轨道上有一个负面的前瞻性，但要让它在 pyparsing 中工作，你必须把它放在表达式的开头，而不是最后。把它想象成“在我匹配另一个有效词之前，我会首先排除它是一个关键字。”:

body1 = delimitedList(OneOrMore(~kw + Word(alphas + nums)))('Body')

(顺便说一下，为什么这是一个 delimitedList？delimitedList 通常保留用于带有分隔符的真实列表，例如程序函数的逗号分隔参数。所有这确实是接受任何可能混入正文的逗号，应该使用标点符号列表更直接地处理。)

这是我的代码测试版本:

from pyparsing import *

kw = Combine(Word(alphas + nums) + Literal(';'))('KEY')
body1 = OneOrMore(~kw + Word(alphas + nums))('Body')
g1 = OneOrMore(Group(kw + body1))

msg = [  """NOW; is the time for a few good ones to come to the aid
of new things to come for it is almost time for
a tornado to strike upon a small hill
when least expected.
lastly; another day progresses and
then we find that which we seek
and finally we will
find our happiness perhaps its closer than 1 or 2 years or not so
    """,
             '',
          ][0]

result = g1.parseString(msg)
# we expect multiple groups, each containing "KEY" and "Body" names,
# so iterate over groups, and dump the contents of each
for res in result:
    print res.dump()

我仍然得到与您相同的结果，只是第一个关键字匹配。因此，为了查看断开连接发生的位置，我使用了 scanString，它不仅返回匹配的标记，还返回匹配标记的开始和结束:

result,start,end = next(g1.scanString(msg))
print len(msg),end

这给了我:

320 161

所以我看到我们在一个总长度为 320 的字符串中的位置 161 处结束，所以我将再添加一个打印语句:

print msg[end:end+10]

我得到:

.
lastly;

正文中的结尾句点是罪魁祸首。如果我从消息中删除它并再次尝试 parseString，我现在得到:

['NOW;', 'is', 'the', 'time', 'for', 'a', 'few', 'good', 'ones', 'to', 'come', 'to', 'the', 'aid', 'of', 'new', 'things', 'to', 'come', 'for', 'it', 'is', 'almost', 'time', 'for', 'a', 'tornado', 'to', 'strike', 'upon', 'a', 'small', 'hill', 'when', 'least', 'expected']
- Body: ['is', 'the', 'time', 'for', 'a', 'few', 'good', 'ones', 'to', 'come', 'to', 'the', 'aid', 'of', 'new', 'things', 'to', 'come', 'for', 'it', 'is', 'almost', 'time', 'for', 'a', 'tornado', 'to', 'strike', 'upon', 'a', 'small', 'hill', 'when', 'least', 'expected']
- KEY: NOW;
['lastly;', 'another', 'day', 'progresses', 'and', 'then', 'we', 'find', 'that', 'which', 'we', 'seek', 'and', 'finally', 'we', 'will', 'find', 'our', 'happiness', 'perhaps', 'its', 'closer', 'than', '1', 'or', '2', 'years', 'or', 'not', 'so']
- Body: ['another', 'day', 'progresses', 'and', 'then', 'we', 'find', 'that', 'which', 'we', 'seek', 'and', 'finally', 'we', 'will', 'find', 'our', 'happiness', 'perhaps', 'its', 'closer', 'than', '1', 'or', '2', 'years', 'or', 'not', 'so']
- KEY: lastly;

如果你想处理标点符号，我建议你添加如下内容:

PUNC = oneOf(". , ? ! : & $")

并将其添加到 body1:

body1 = OneOrMore(~kw + (Word(alphas + nums) | PUNC))('Body')

关于python - 获得一种语法来阅读文本中的多个关键字，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18039236/

文章推荐： css - 如何不在另一个 p 类中包含第一行

文章推荐：为 AIX 编译 ANTLR 3C

文章推荐： html - Facebook 三 Angular 箭头

python - 获得 Mechanize 和斜纹布说话
我正在为我的雇主编写脚本，以从他们自己的站点获取某些数据。出于一长串原因，我需要从网站上获取数据，如图所示。我发现，其中一些数据是通过 js 调用检索的... 回想起来，我应该选择 Mechanize
cryptography - 获得 ECDSA 签名的固定长度字节表示的正确方法是什么？
我正在使用 python 和 cryptography.io 来签署和验证消息。我可以通过以下方式获得签名的 DER 编码字节表示: cryptography_priv_key.sign(messag
licensing - 获得 GPLv2 许可的库能否用于专有应用程序？
关闭。这个问题不满足Stack Overflow guidelines .它目前不接受答案。想改善这个问题吗？更新问题，使其成为 on-topic对于堆栈溢出。 6年前关闭。 Improve thi
r - 获得 ECDF 的导数
是否可以区分 ECDF？以下面得到的为例。 set.seed(1) a <- sort(rnorm(100)) b <- ecdf(a) plot(b) 我想对 b 求导以获得它的概率密度函数 (PD
Javascript 获得 MIME 类型支持
我找到了如何从 navigator.mimeTypes 获取 mimetypes: function GetMimeTypes() { var message = ""; var mi
Javascript 添加的表单元素无法通过 $_POST 获得
我在表单中使用单选按钮来隐藏/显示联系人表单中的成员 ID 字段。问题是，当 javascript 更改 html 中包含的隐藏 id 字段(该字段设置为“无”值)时，该字段将不再通过 post 可用
c++ - 如何从snprintf()获得-1
我正在做单元测试。我必须测试所有可能的if..else情况。但是在此if语句中: int32_t i32Res = snprintf(buffer, len, "The%d_String_%d", 0
facebook - 获得 Facebook 应用页面的总点赞数
我有一个 Facebook 应用程序，我想从中获取“喜欢”的总数。我想知道这是否可能。其中 ID 是应用程序的 ID，ACCESS_TOKEN 是我尝试过的应用程序的当前访问 token : gra
azure - 获得 blob 的多个租约
如果我有多个计算实例尝试同时获取同一个 blob 的租约，则似乎经常会成功。我的印象是，一旦租约发出(并因此被客户获得)，就不可能同时发出另一个租约？我希望情况确实如此，我一直在 Azure 中使用
elasticsearch - elasticsearch-获得 'function_score'内的中级分数
这是我的索引 POST /blogs/1 { "name" : "learn java", "popularity" : 100 } POST /blogs/2 { "name" : "l
symfony - 获得 Symfony2 中的最高用户角色
我正在将 Symfony2 与 FOSUserBundle 一起使用。我需要为用户获得最高角色。 role_hierarchy: ROLE_CONTRIBUTOR: ROLE_USER
java - 获得 REST 请求的快速响应
我正在向服务器发送基于 REST 的请求。我希望尽快得到答复，并希望了解可以进行的各种优化。一种方法当然是在线程中并行发送这些请求。还有哪些其他选项可用于优化此功能？在服务器上，可以添加哪些配置？
java - 获得 HEAD 之上的提交
这可能是某种重复的问题，但我似乎找不到合适的解决方案。我正在使用 git4idea.history.GitHistoryUtils.history() 获取提交列表。如果 checkout 其中一个较
c - 获得 centavos 或小数点后两位的公式是什么？
我正在做一个程序，可以输入每周的工资和那一周的总工作时间。它应该以小时工资率显示答案。但是我无法显示正确的“centavos/2 decimal places”公式并且它不想使用 float % fl
c - 获得 2 的幂相关结果的优雅方法
已结束。此问题正在寻求书籍、工具、软件库等的推荐。它不满足Stack Overflow guidelines 。目前不接受答案。我们不允许提出寻求书籍、工具、软件库等推荐的问题。您可以编辑问题，以便
android - 获得 Looper 的最佳做法是什么？
我已经尝试了 mContext.getMainLooper() 和 Looper.getMainLooper()。两者都返回相同的结果，但我想知道哪种方法正确？我还从 Android 开发人员链接中
mySQL 获得 n 级附属机构
我有一个“affiliates”表，其中包含“user”和“referredBy”列。给定一个用户，我希望获得该用户推荐的所有“n 级”玩家。对于 n=1，我们只关心您直接推荐的玩家数量: SELE
sql - 获得 10 个不同的项目以及相关任务的最新更新
我在 PostgreSQL 9.5 数据库中有两个表: project - id - name task - id - project_id - name - updated_
PHP Iplode 获得 IN 查询的预期结果
请帮助我怎样才能得到我预期的结果，在此先感谢并抱歉我的英语不好。 PHP: $dog = implode(',', $data['dogbreed']); $query .= "AND `do
ios - 获得 CGVector 的负数
我有 let impulse = CGVectorMake(CGFloat(Constants.impulse), 0) 如何在不创建另一个 CGVector 的情况下得到它的负值？我正在考虑在 C

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 获得一种语法来阅读文本中的多个关键字