gpt4 book ai didi

c# - 有没有办法从 FCKEditor 中删除所有不必要的 MS Word 格式

转载 作者:数据小太阳 更新时间:2023-10-29 04:55:05 26 4
gpt4 key购买 nike

我已经安装了 fckeditor,当从 MS Word 粘贴时,它添加了很多不必要的格式。我想保留某些东西,比如粗体、斜体、圆点等等。我在网上搜索并提出了解决方案,可以去除所有内容,甚至是我想保留的内容,如粗体和斜体。有没有办法只去除不必要的文字格式?

最佳答案

以防万一有人想要已接受答案的 c# 版本:

public string CleanHtml(string html)
{
//Cleans all manner of evils from the rich text editors in IE, Firefox, Word, and Excel
// Only returns acceptable HTML, and converts line breaks to <br />
// Acceptable HTML includes HTML-encoded entities.

html = html.Replace("&" + "nbsp;", " ").Trim(); //concat here due to SO formatting
// Does this have HTML tags?

if (html.IndexOf("<") >= 0)
{
// Make all tags lowercase
html = Regex.Replace(html, "<[^>]+>", delegate(Match m){
return m.ToString().ToLower();
});
// Filter out anything except allowed tags
// Problem: this strips attributes, including href from a
// http://stackoverflow.com/questions/307013/how-do-i-filter-all-html-tags-except-a-certain-whitelist
string AcceptableTags = "i|b|u|sup|sub|ol|ul|li|br|h2|h3|h4|h5|span|div|p|a|img|blockquote";
string WhiteListPattern = "</?(?(?=" + AcceptableTags + @")notag|[a-zA-Z0-9]+)(?:\s[a-zA-Z0-9\-]+=?(?:([""']?).*?\1?)?)*\s*/?>";
html = Regex.Replace(html, WhiteListPattern, "", RegexOptions.Compiled);
// Make all BR/br tags look the same, and trim them of whitespace before/after
html = Regex.Replace(html, @"\s*<br[^>]*>\s*", "<br />", RegexOptions.Compiled);
}


// No CRs
html = html.Replace("\r", "");
// Convert remaining LFs to line breaks
html = html.Replace("\n", "<br />");
// Trim BRs at the end of any string, and spaces on either side
return Regex.Replace(html, "(<br />)+$", "", RegexOptions.Compiled).Trim();
}

关于c# - 有没有办法从 FCKEditor 中删除所有不必要的 MS Word 格式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1349837/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com