gpt4 book ai didi

java - 当属性值包含关键字时尝试使用 antlr2 解析 edifact 文件时出错

转载 作者:行者123 更新时间:2023-12-02 01:58:13 24 4
gpt4 key购买 nike

我有一个忘恩负义的任务来修复旧的antlr2解析器中的错误,该解析器用于解析edifact文件。不幸的是,我对antlr2或解析器不太熟悉,我无法让它工作。

edifact 文件如下所示:

ABC+Name+Surname+zip+city+street+country+1961219++0037141008'
XYZ+Company+++XYZ+zip+street'
LMN+20081010+1100'

有几个不同的段,它们以关键字开头。例如。 XYZ 或 ABC。关键字后跟不同的属性值,全部用“+”分隔。属性值可以为空。每个段都以 ' 结尾。

问题是,只要数据属性包含关键字,解析器就会抛出错误:

意外 token :XYZ

XYZ+公司+++XYZ+zip+街道'

这是语法文件的摘录:

// $ANTLR 2.7.6


xyz: "XYZ" ELT_SEP!
(xyz1_1a:ANUM|xyz1_1b:NUM) {lq(90,xyz1_1a,xyz1_1b,"XYZ1-1"+LQ90)}? ELT_SEP!
(xyz1_2a:ANUM|xyz1_2b:NUM)? {lq_(90,xyz1_2a,xyz1_2b,"XYZ1-2"+LQ90)}? ELT_SEP!
(xyz1_3a:ANUM|xyz1_3b:NUM)? {lq_(90,xyz1_3a,xyz1_3b,"XYZ1-3"+LQ90)}? ELT_SEP!
(xyz2a:ANUM|xyz2b:NUM)? {lq_(3,xyz2a,xyz2b,"XYZ2"+LQ3)}? ELT_SEP!
(xyz3a:ANUM|xyz3b:NUM)? {lq_(6,xyz3a,xyz3b,"XYZ3"+LQ6)}? ELT_SEP!
(xyz4a:ANUM|xyz4b:NUM) {lq(30,xyz4a,xyz4b,"XYZ4"+LQ30)}?
(ELT_SEP! (xyz5a:ANUM|xyz5b:NUM)?)? {lq_(46,xyz5a,xyz5b,"XYZ5"+LQ46)}? SEG_TERM!
{
if (skipNachricht()) return;
Xyz xyz = new Xyz();
xyz.xyz1_1 = getText(nn(xyz1_1a, xyz1_1b));
xyz.xyz1_2 = getText(nn(xyz1_2a, xyz1_2b));
xyz.xyz1_3 = getText(nn(xyz1_3a, xyz1_3b));
xyz.xyz2 = getText(nn(xyz2a, xyz2b));
xyz.xyz3 = getText(nn(xyz3a, xyz3b));
xyz.xyz4 = getText(nn(xyz4a, xyz4b));
xyz.xyz5 = getText(nn(xyz5a, xyz5b));
handleXyz(xyz);
}
;



/*
* Lexer
*/
class EdifactLexer extends Lexer;

options {
k=2;
filter=true;
charVocabulary = '\3'..'\377'; // Latin
}

DEZ_SEP: ','
{
//System.out.println("Found dez_sep: " + getText());
}
;

ELT_SEP: '+'
{
//System.out.println("Found elt_sep: " + getText());
}
;

SEG_TERM: '\''
{
// System.out.println("Found seg_term: " + getText());
}
;

NUM: (('0'..'9')+ (',' ('0'..'9')+)? ('+' | '\''))
=> ('0'..'9')+ (',' ('0'..'9')+)?
{
//System.out.println("num_: " + getText());
}
|
((ESCAPED | ~('?' | '+' | '\'' | ',' | '\r' | '\n'))+ )
=> ( ESCAPED | ~('?' | '+' | '\'' | ',' | '\r' | '\n'))+
{
$setType(ANUM);
//System.out.println("anum: " + getText());
}
|
(WRONGLY_ESCAPED) => WRONGLY_ESCAPED
{$setType(WRONGLY_ESCAPED); }
;

protected
WRONGLY_ESCAPED: '?' ~('?' | ':' | '+' | '\'' | ',')
{
//System.out.println("Found wrong_escaped: " + getText());
}
;

protected
ESCAPED: '?'
( ',' {$setText(","); }
| '?' {$setText("?"); }
| '\'' {$setText("'"); }
| ':' {$setText(":"); }
| '+' {$setText("+"); }
)
{
//System.out.println("Found escaped: " + getText());
}
;

NEWLINE : ( "\r\n" // DOS
| '\r' // MAC
| '\n' // Unix
)
{ newline();
$setType(Token.SKIP);
}
;

非常感谢任何帮助:)。

最佳答案

这可能不是最好的解决方案,但我终于找到了解决我的问题的方法。因此,如果有人遇到类似问题,这就是我的解决方案:

我编写了一个方法,如果当前 token 类型与我的任何关键字匹配,则将 token 类型更改为 ANUM:

void ckt() throws TokenStreamException, SemanticException {
if (mKeywordList.contains(LT(1).getType())) {
LT(1).setType(ANUM);
}
}

在尝试访问 ANUM-Token 之前,在我的解析器规则中调用该方法:

xyz: "XYZ"       ELT_SEP! 
{ckt();}(xyz1_1a:ANUM|xyz1_1b:NUM) {lq(90,xyz1_1a,xyz1_1b,"XYZ1-1"+LQ90)}? ELT_SEP!
{ckt();}(xyz1_2a:ANUM|xyz1_2b:NUM)? {lq_(90,xyz1_2a,xyz1_2b,"XYZ1-2"+LQ90)}? ELT_SEP!
{ckt();}(xyz1_3a:ANUM|xyz1_3b:NUM)? {lq_(90,xyz1_3a,xyz1_3b,"XYZ1-3"+LQ90)}? ELT_SEP!
{ckt();}(xyz2a:ANUM|xyz2b:NUM)? {lq_(3,xyz2a,xyz2b,"XYZ2"+LQ3)}? ELT_SEP!
{ckt();}(xyz3a:ANUM|xyz3b:NUM)? {lq_(6,xyz3a,xyz3b,"XYZ3"+LQ6)}? ELT_SEP!
{ckt();}(xyz4a:ANUM|xyz4b:NUM) {lq(30,xyz4a,xyz4b,"XYZ4"+LQ30)}?
(ELT_SEP! {ckt();}(xyz5a:ANUM|xyz5b:NUM)?)? {lq_(46,xyz5a,xyz5b,"XYZ5"+LQ46)}? SEG_TERM!
{
if (skipNachricht()) return;
Xyz xyz = new Xyz();
xyz.xyz1_1 = getText(nn(xyz1_1a, xyz1_1b));
xyz.xyz1_2 = getText(nn(xyz1_2a, xyz1_2b));
xyz.xyz1_3 = getText(nn(xyz1_3a, xyz1_3b));
xyz.xyz2 = getText(nn(xyz2a, xyz2b));
xyz.xyz3 = getText(nn(xyz3a, xyz3b));
xyz.xyz4 = getText(nn(xyz4a, xyz4b));
xyz.xyz5 = getText(nn(xyz5a, xyz5b));
handleXyz(xyz);
}
;

关于java - 当属性值包含关键字时尝试使用 antlr2 解析 edifact 文件时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57388351/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com