gpt4 book ai didi

c# - 由于错误 "Reference to undeclared entity ' nbsp',XmlDocument 无法加载 XHTML 字符串”

转载 作者:数据小太阳 更新时间:2023-10-29 02:22:22 26 4
gpt4 key购买 nike

我使用以下代码将 HTTP 响应流转换为 XmlDocument。

HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
Stream responseStream = response.GetResponseStream();
StreamReader responseReader = new StreamReader(responseStream);
String responseString = responseReader.ReadToEnd();
Console.WriteLine(responseString);
Int32 htmlTagIndex = responseString.IndexOf("<html",
StringComparison.OrdinalIgnoreCase);
XmlDocument responseXhtml = new XmlDocument();
responseString = responseString.Substring(htmlTagIndex); // MARK 1
responseString = responseString.Replace("&nbsp", " "); // MARK 2
responseXhtml.LoadXml(responseString);
return responseXhtml;

MARK 1 行是跳过 DOC 类型定义行。

MARK 2 行是为了避免错误Reference to undeclared entity 'nbsp'

有没有更好的方法来做到这一点?上述代码中字符串操作过多。

谢谢!

最佳答案

我会直接使用 HtmlAgilityPack解析html。即使您必须将 html 转换为 xml,也可以使用它。

using (WebClient wc = new WebClient())
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(wc.DownloadString("http://www.google.com"));
doc.OptionOutputAsXml = true;

StringWriter writer = new StringWriter();
doc.Save(writer);

var xDoc = XDocument.Load(new StringReader(writer.ToString()));
}

关于c# - 由于错误 "Reference to undeclared entity ' nbsp',XmlDocument 无法加载 XHTML 字符串”,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12822680/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com