gpt4 book ai didi

javascript - 如何使用 JavaScript 和 RegEx 解析复杂的 BibTex 项目

转载 作者:行者123 更新时间:2023-11-30 12:09:20 25 4
gpt4 key购买 nike

我正在尝试使用正则表达式用 Javascript 解析一个 Bibtex 文件,但我似乎找不到合适的解决方案。在下面的示例中,bj 是一个包含引用书目项的子项的数组。我不得不编写一个很长的正则表达式来考虑值可以分成多行、缺少大括号 ({}) 或末尾有语法错误逗号 (例如,最后一个字段不应以逗号结尾,但一些 TeX 编辑器不会对此提示)。

这是我用来测试正则表达式的内容:

@inproceedings{Carrel2005,
title = {{Algorithm} for near-optimal autonomous resource management},
author = {Carrel, Ândrew and Palmer, Phil},
notes = nonote ,
booktitle = {8th International Symposium on Artificial {Intelligence,
Robotics}, and Automation in Space},
year = {2005}
blahblah = error,
}

如您所见,一些值被分成两行,并且内部可以有花括号。我一直在努力改进的正则表达式如下:

var txt = "@inproceedings{Carrel2005, \n" +
" title = {{Algorithm} for near-optimal autonomous resource management}, \n" +
" author = {Carrel, Ândrew and Palmer, Phil}, \n" +
" notes = nonote ,\n" +
" booktitle = {8th International Symposium on Artificial Intelligence, \n" +
" Robotics and Automation in Space}, \n" +
" year = {2005} \n" +
" blahblah = error,\n}";

bj = txt.match(/\w*[\t ]*=[\t ]*(\{[\u0020-\u0080\u00A1-\u00FF\u0300-\u036F\t\r\n]*?}|[a-zA-Z0-9]+)[\t ]*(,(?!\s*}))?/g);

解释:

\w*               A word for the field name.
[\t ]*=[\t ]* Any number of spaces or tabs after and before the equal sign.
( Start of group 1.
\{ Option 11: starts by an opening curly brace.
[ Start of character class AAA.
unicode-set Letters (basic Latin plus some extensions)
\t\r\n ... or whitespace.
]*? End of character class AAA (with LAZY repetition)
| End of option 11, start of option 12:
[a-zA-Z0-9]+ One or more characters (no underscore or whitespace allowed).
) End of option 12 and group 1.
[\t ]* Any number of tabs or spaces.
( Start of group 2:
, A literal comma
(?!\s*}) ...if it is not followed by whitespace and closing curly braces.
)? End of group 2. ? denotes it is optional.

我无法匹配以多个大括号开头的字段(例如 {{Algorithm} for near...),也无法正确匹配序列 } 的字段, 在里面找到。

最佳答案

正如我在评论中提到的,匹配任意深度的大括号是不可能的,因为这需要某种状态来存储您看到的数字。你需要一个解析器,然后添加状态。它看起来像:

var txt = "@inproceedings{Carrel2005, \n" +
" title = {{Algorithm} for near-optimal autonomous resource management}, \n" +
" author = {Carrel, Ândrew and Palmer, Phil}, \n" +
" notes = nonote ,\n" +
" booktitle = {8th International Symposium on Artificial Intelligence, \n" +
" Robotics and Automation in Space}, \n" +
" year = {2005} \n" +
" blahblah = error,\n}";


function parseBibTexLine (text) {
var m = text.match(/^\s*(\S+)\s*=\s*/);
if (!m) {
console.log('line: "' + text + '"');
throw new Error('Unrecogonised line format');
}
var name = m[1];
var search = text.slice(m[0].length);
var re = /[\n\r,{}]/g;
var braceCount = 0;
var length = m[0].length;
do {
m = re.exec(search);
if (m[0] === '{') {
braceCount++;
} else if (m[0] === '}') {
if (braceCount === 0) {
throw new Error('Unexpected closing brace: "}"');
}
braceCount--;
}
} while (braceCount > 0);
return {
field:name,
value: search.slice(0, re.lastIndex),
length:length + re.lastIndex + m[0].length
};
}

function parseBibTex (text) {
var m = text.match(/^\s*@([^{]+){([^,\n]+)[,\n]/);
if (!m) {
throw new Error('Unrecogonised header format');
}
var result = {
typeName: m[1].trim(),
citationKey: m[2].trim()
}
text = text.slice(m[0].length).trim();
while (text[0] !== '}') {
var pair = parseBibTexLine(text);
result[pair.field] = pair.value;
text = text.slice(pair.length).trim();
}
return result;
}

console.log(parseBibTex(txt));

我当然没有对此进行过深入测试,但是当根据您的输入运行时,我得到:

{ typeName: 'inproceedings',
citationKey: 'Carrel2005',
title: '{{Algorithm} for near-optimal autonomous resource management}',
author: '{Carrel, Ândrew and Palmer, Phil}',
notes: 'nonote ,',
booktitle: '{8th International Symposium on Artificial Intelligence, \n Robotics and Automation in Space}',
year: '{2005}',
blahblah: 'error,' }

关于javascript - 如何使用 JavaScript 和 RegEx 解析复杂的 BibTex 项目,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34221996/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com