gpt4 book ai didi

c# - 如何在不是有效 xml 文件的文件中生成关闭节点?

转载 作者:太空狗 更新时间:2023-10-29 20:35:53 25 4
gpt4 key购买 nike

如何在不是有效 xml 文件的文本文件的某些位置添加给定节点 ( <sec> ) 的结束节点。我知道这有点令人困惑,但这是 sample input text这是它的 desired output

基本上程序应该生成</sec>下一个 <sec> 之前的节点节点和多少 </sec>它是否会添加到所需位置取决于属性 id节点 <sec>使用由 . 分隔的数字如下:

如果下<sec>节点后说,<sec id="4.5"><sec id="5">然后 2 </sec>应在 <sec id="5"> 之前添加

如果下<sec>节点后说,<sec id="3.2.1.2"><sec id="3.4">然后 3 </sec>应在 <sec id="3.4"> 之前添加节点

显然我不能使用任何 xml 解析方法来做到这一点,还有什么其他方法可以做到这一点......我现在一无所知......谁能帮忙?示例输入

<?xml version="1.0" encoding="utf-8"?>
<body>
<sec id="sec1">
<title>Introduction</title>
<p>Tuberculosis is associated with high mortality rate although according to the clinical trials that have been documented</p>
<sec id="sec1.2">
<title>Related Work</title>
<p>The main contributions in this study are:
<list list-type="ordered">
<list-item><label>I.</label><p>Introducing SURF features descriptors for TB detection which for our knowledge has not been used in this problem before.</p></list-item>
<list-item><label>II.</label><p>Providing an extensive study of the effect of grid size on the accuracy of the SURF.</p></list-item>
</list></p>
</sec>
<sec id="sec1.3">
<title>Dataset</title>
<p>The dataset used in this work is a standard computerized images database for tuberculosis gathered and organized by National Library of Medicine in collaboration with the Department of Health and Human Services, Montgomery County, Maryland; USA <xref ref-type="bibr" rid="ref15">[15]</xref>. The set contains 138 x-rays, 80 for normal cases and 58 with TB infections. The images are annotated with clinical readings comes in text notes with the database describing age, gender, and diagnoses. The images comes in 12 bits gray levels, PNG format, and size of 4020*4892. The set contains x-ray images information gathered under Montgomery County&#x0027;s Tuberculosis screening program.</p>
<sec id="sec1.3.5">
<sec id="sec1.3.5.2">
<title>Methodologies</title>
<sec id="sec2">
<p>The majority of TB and death cases are in developing countries.</p>
<sec id="sec2.5">
<p>The disordered physiological manifestations associated with TB is diverse and leads to a complex pathological changes in the organs like the lungs.</p>
<sec id="sec2.5.3">
<sec id="sec2.5.3.1">
<p>The complexity and diversity in the pulmonary manifestations are reported to be caused by age.</p>
<sec id="sec2.5.3.1.1">
</sec>
</sec>
</body>

期望的输出

<?xml version="1.0" encoding="utf-8"?>
<body>
<sec id="sec1">
<title>Introduction</title>
<p>Tuberculosis is associated with high mortality rate although according to the clinical trials that have been documented</p>
<sec id="sec1.2">
<title>Related Work</title>
<p>The main contributions in this study are:
<list list-type="ordered">
<list-item><label>I.</label><p>Introducing SURF features descriptors for TB detection which for our knowledge has not been used in this problem before.</p></list-item>
<list-item><label>II.</label><p>Providing an extensive study of the effect of grid size on the accuracy of the SURF.</p></list-item>
</list></p>
</sec>
<sec id="sec1.3">
<title>Dataset</title>
<p>The dataset used in this work is a standard computerized images database for tuberculosis gathered and organized by National Library of Medicine in collaboration with the Department of Health and Human Services, Montgomery County, Maryland; USA <xref ref-type="bibr" rid="ref15">[15]</xref>. The set contains 138 x-rays, 80 for normal cases and 58 with TB infections. The images are annotated with clinical readings comes in text notes with the database describing age, gender, and diagnoses. The images comes in 12 bits gray levels, PNG format, and size of 4020*4892. The set contains x-ray images information gathered under Montgomery County&#x0027;s Tuberculosis screening program.</p>
<sec id="sec1.3.5">
<sec id="sec1.3.5.2">
<title>Methodologies</title>
</sec>
</sec>
</sec>
</sec>
<sec id="sec2">
<p>The majority of TB and death cases are in developing countries.</p>
<sec id="sec2.5">
<p>The disordered physiological manifestations associated with TB is diverse and leads to a complex pathological changes in the organs like the lungs.</p>
<sec id="sec2.5.3">
<sec id="sec2.5.3.1">
<p>The complexity and diversity in the pulmonary manifestations are reported to be caused by age.</p>
<sec id="sec2.5.3.1.1">
</sec>
</sec>
</sec>
</sec>
</sec>
</body>

最佳答案

为了完成这个任务,我定义了一个额外的方法,它将返回多少结束标签</sec>应根据 ID 的不同插入:

public static int HowManyClosingTags(string startTagId, string endTagId)
{
// if IDs are the same, then we don't need any closing tags
if(startTagId == endTagId )
return 0;
// if following ID is subsection of previous tag section, then we don't need any closing tags
if (endTagId.IndexOf(startTagId) == 0)
return 0;

int i = 0;
while (startTagId[i] == endTagId[i])
i++;

return startTagId.Substring(i).Count(ch => ch == '.') + 1;
}

我使用字符串,因为它是无效的 XML,不能作为一个加载(XmlDocument.Load() 方法在无效 XML 的情况下抛出异常)。所以我正在对字符串进行基本操作(我希望这在代码中是可以理解的,我还包含了尽可能多的注释以使其清楚)。下面是代码:

static void Main(string[] args)
{
string invalidXml = "your invalid XML";
int closeTagPos = -1;
int openTagPos = -1;
string openTagId = "";
string closeTagId = "";
int howManyClosingTagsAlready;
int lastPos;
int howManyTagsToInsert;
while (true)
{
//get indexes of opening tag and close tag, break, if none is found
if((openTagPos = invalidXml.IndexOf("<sec id=\"sec", openTagPos + 1)) == -1)
break;
if((closeTagPos = invalidXml.IndexOf("<sec id=\"sec", openTagPos + 1)) == -1)
break;
//get the IDs of tags
openTagId = invalidXml.Substring(
openTagPos + 12,
invalidXml.IndexOf('"', openTagPos + 12) - openTagPos - 12
);
closeTagId = invalidXml.Substring(
closeTagPos + 12,
invalidXml.IndexOf('"', closeTagPos + 12) - closeTagPos - 12
);
//count how many tags were already closed
howManyClosingTagsAlready = 0;
lastPos = invalidXml.IndexOf("</sec>", openTagPos);
while (lastPos > -1 && lastPos < closeTagPos)
{
howManyClosingTagsAlready++;
lastPos = invalidXml.IndexOf("</sec>", lastPos + 1);
}

howManyTagsToInsert = HowManyClosingTags(openTagId, closeTagId) - howManyClosingTagsAlready;
for (int i = 0; i < howManyTagsToInsert; i++)
{
//insert closing tags
invalidXml = invalidXml.Insert(closeTagPos, "</sec>");
}
}
//now we have to close our last "unclosed" tag, in this case
//</body> is treated as colsing tag, the logic stays the same
openTagId = invalidXml.Substring(
openTagPos + 12,
invalidXml.IndexOf('"', openTagPos + 12) - openTagPos - 12
);
closeTagPos = invalidXml.IndexOf("</body>");
howManyClosingTagsAlready = 0;
lastPos = invalidXml.IndexOf("</sec>", openTagPos);
while (lastPos > -1 && lastPos < closeTagPos)
{
howManyClosingTagsAlready++;
lastPos = invalidXml.IndexOf("</sec>", lastPos + 1);
}

howManyTagsToInsert = openTagId.Count(ch => ch == '.') + 1 - howManyClosingTagsAlready;

for (int i = 0; i < howManyTagsToInsert; i++)
{
//insert closing tags
invalidXml = invalidXml.Insert(closeTagPos, "</sec>");
}

XmlDocument xml = new XmlDocument();
xml.LoadXml(invalidXml);
}

关于c# - 如何在不是有效 xml 文件的文件中生成关闭节点?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49466168/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com