gpt4 book ai didi

c++ - TinyXML2 C++ - 从旧的/格式不正确的 XML 文件中提取特定数据

转载 作者:太空宇宙 更新时间:2023-11-04 12:52:07 26 4
gpt4 key购买 nike

我希望在相当陈旧的 XML block (1999 年的文档)中进行搜索,但我在让 TinyXML2 按预期运行时遇到了一些困难。我可以抓取某些片段,但当另一个元素中有一个元素时,我会遇到问题。拿这个例子:

  <SUBJECT><TITLE>Mathematics</TITLE></SUBJECT>
<AREA><TITLE>Arithmetic</TITLE></AREA>
<SECTION><TITLE>Whole Numbers</TITLE></SECTION>
<TOPIC GRADELEVEL="4"><TITLE>Introduction to Numbers</TITLE></TOPIC>
<DESCRIPTION><TITLE>Description</TITLE></DESCRIPTION>
<FIELDSPACE>
<PARA>To represent each conceivable number by means of a separate
little picture or number symbol is impossible. Therefore the civilizations of
the past all developed a certain pattern whereby they could write down numbers,
by making use of a small number of symbols. </PARA>
</FIELDSPACE>
<FIELDSPACE>
<PARA>Today, we use the Hindu-Arabic system, which first of all is
decimal, because we make use of only 10 different symbols, namely,</PARA>
<LITERALLAYOUT> 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.</LITERALLAYOUT>
</FIELDSPACE>
<FIELDSPACE>
<PARA>Secondly, a place value applies. This means that if only 1
digit is written down then it is that number, such as a 3, a 6, or an 8.</PARA>
</FIELDSPACE>
<FIELDSPACE>
<PARA>Thirdly, only the addition principle is built into our number
symbols.</PARA>
<PARA>In other words,</PARA>
<LITERALLAYOUT> 135 means 100 + 300 + 5</LITERALLAYOUT>
<LITERALLAYOUT> 6.3 means 6 + three tenths = 6 + <EQUATION>
<INLINEGRAPHIC FILEREF="Mathematics/Arithmetic/WholeNumbers/IntroductionNumbers/eq.png" />
</EQUATION></LITERALLAYOUT>
<LITERALLAYOUT> and two and a quarter = <EQUATION>
<INLINEGRAPHIC FILEREF="Mathematics/Arithmetic/WholeNumbers/IntroductionNumbers/eq2.png" />
</EQUATION></LITERALLAYOUT>
<PARA>means</PARA>
<LITERALLAYOUT> two plus a quarter = <EQUATION>
<INLINEGRAPHIC FILEREF="Mathematics/Arithmetic/WholeNumbers/IntroductionNumbers/eq3.png" />
</EQUATION></LITERALLAYOUT>
</FIELDSPACE>

这是我写的:

    XMLDocument doc;
Resource::resource_t *f = Resource::Open("IntroductionNumbers.xml"); // File load

if (!f)
return;

doc.Parse((const char*)f->buffer, f->size);
Resource::Close(f);

XMLElement *pElem;
pElem = doc.FirstChildElement();

if (!pElem)
return;
for (pElem = pElem->FirstChildElement(); pElem; pElem = pElem->NextSiblingElement())
{
if (!strcmp(pElem->Value(), "SUBJECT"))
{
// Print what's in pElem->FirstChildElement("TITLE")->GetText()
// This works fine.
}
else if (!strcmp(pElem->Value(), "AREA"))
{
// Print what's in pElem->FirstChildElement("TITLE")->GetText()
// This works fine.
}
...
...
...
else if (!strcmp(pElem->Value(), "TOPIC"))
{
char *temp;
temp = msprintf("%s - Section %s", pElem->FirstChildElement("TITLE")->GetText(), pElem->FirstAttribute()->Value());
// Print what's in temp
// This still works!
}
else if (!strcmp(pElem->Value(), "FIELDSPACE"))
{
// I can print PARA or FIELDSPACE, but I can't seem to read LITERALLAYOUT, EQUATION, or INLINEGRAPHIC.
}
}

我需要通用代码而不是特定于此解决方案的代码 - 有数百个 XML 文件,我需要编写能够解析所有这些文件的代码。我如何获取 LITERALLAYOUT/EQUATION/INLINEGRAPHIC 中的信息?

提前致谢!

最佳答案

只是建立在以前的答案之上。这就是你所拥有的:

<LITERALLAYOUT>xxxxxxxxx
<EQUATION>
<INLINEGRAPHIC FILEREF="Mathematics/Arithmetic/WholeNumbers/IntroductionNumbers/eq.png" />
</EQUATION>
</LITERALLAYOUT>

这里发生了两件事。当您到达 LITERALLAYOUT 时,您可以使用 GetText 并返回 xxxxxxxxx

但是你有一个选择。如果你想要它通用,你必须迭代你的 LITERALLAYOUT 指针的所有子元素。如果您不想这样做,那么您必须提取第一个 child ,例如:

XMLElement *pLITERALLAYOUT = xxxx; // You get this pointer.

XMLElement *pEQUATION = pLITERALLAYOUT->FirstChildElement("EQUATION");
if (pEQUATION != nullptr)
{
// Now get the INLINEGRAPHIC element
XMLElement *pINLINEGRAPHIC = pEQUATION->FirstChildElement("INLINEGRAPHIC");

if (pINLINEGRAPHIC != nullptr)
{
const char * FILEREF;
FILEREF = pINLINEGRAPHIC ->Attribute("FILEREF");
}
}

看到了吗?您必须知道导航 XML 文件的正确方法。

关于c++ - TinyXML2 C++ - 从旧的/格式不正确的 XML 文件中提取特定数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48585291/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com