gpt4 book ai didi

.net - 如何从 C# 中的文件中删除字符的所有实例?

转载 作者:行者123 更新时间:2023-12-04 21:56:09 25 4
gpt4 key购买 nike

我正在处理来自第三方的 XML 文件。这些文件有时会包含无效字符,这会导致 XMLTextReader.Read() 引发异常。

我目前正在使用以下函数处理此问题:

XmlTextReader GetCharSafeXMLTextReader(string fileName)
{
try
{
MemoryStream ms = new MemoryStream();
StreamReader sr = new StreamReader(fileName);
StreamWriter sw = new StreamWriter(ms);
string temp;
while ((temp = sr.ReadLine()) != null)
sw.WriteLine(temp.Replace(((char)4).ToString(), "").Replace(((char)0x14).ToString(), ""));

sw.Flush();
sr.Close();
ms.Seek(0, SeekOrigin.Begin);
return new XmlTextReader(ms);
}
catch (Exception exp)
{
throw new Exception("Error parsing file: " + fileName + " " + exp.Message, exp.InnerException);
}
}

我的直觉告诉我应该有更好/更快的方法来做到这一点。 (是的,让第三方修复他们的 XML 会很棒,但目前还没有发生。)

编辑:这是最终的解决方案,基于 cfeduke 的回答:


public class SanitizedStreamReader : StreamReader
{
public SanitizedStreamReader(string filename) : base(filename) { }
/* other ctors as needed */
// this is the only one that XmlTextReader appears to use but
// it is unclear from the documentation which methods call each other
// so best bet is to override all of the Read* methods and Peek
public override string ReadLine()
{
return Sanitize(base.ReadLine());
}

public override int Read()
{
int temp = base.Read();
while (temp == 0x4 || temp == 0x14)
temp = base.Read();
return temp;
}

public override int Peek()
{
int temp = base.Peek();
while (temp == 0x4 || temp == 0x14)
{
temp = base.Read();
temp = base.Peek();
}
return temp;
}

public override int Read(char[] buffer, int index, int count)
{
int temp = base.Read(buffer, index, count);
for (int x = index; x < buffer.Length; x++)
{
if (buffer[x] == 0x4 || buffer[x] == 0x14)
{
for (int a = x; a < buffer.Length - 1; a++)
buffer[a] = buffer[a + 1];
temp--; //decrement the number of characters read
}
}
return temp;
}

private static string Sanitize(string unclean)
{
if (unclean == null)
return null;
if (String.IsNullOrEmpty(unclean))
return "";
return unclean.Replace(((char)4).ToString(), "").Replace(((char)0x14).ToString(), "");
}
}

最佳答案

清理数据很重要。有时边缘情况——“XML”中的无效字符——确实会发生。您的解决方案是正确的。如果您想要一个适合 .NET Framework 的解决方案,关于流重组您的代码以适应它自己的 Stream:

public class SanitizedStreamReader : StreamReader {
public SanitizedStreamReader(string filename) : base(filename) { }
/* other ctors as needed */

// it is unclear from the documentation which methods call each other
// so best bet is to override all of the Read* methods and Peak
public override string ReadLine() {
return Sanitize(base.ReadLine());
}

// TODO override Read*, Peak with a similar logic as this.ReadLine()
// remember Read(Char[], Int32, Int32) to modify the return value by
// the number of removed characters

private static string Sanitize(string unclean) {
if (String.IsNullOrEmpty(unclean)
return "";
return unclean.Replace(((char)4).ToString(), "").Replace(((char)0x14);
}
}

有了这个新的 SanitizedStreamReader,您将能够根据需要将它链接到处理流中,而不是依赖一种神奇的方法来清理事物并为您提供 XmlTextReader:

return new XmlTextReader(new SanitizedStreamReader("filename.xml"));

诚然,这可能比必要的工作更多,但您将从这种方法中获得灵 active 。

关于.net - 如何从 C# 中的文件中删除字符的所有实例?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14242112/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com