gpt4 book ai didi

php - strip_tags 不允许某些标签

转载 作者:太空狗 更新时间:2023-10-29 14:56:11 27 4
gpt4 key购买 nike

基于 strip_tags 文档,第二个参数采用允许的标签。但是,就我而言,我想反其道而行之。假设我将接受标签 script_tags通常(默认)接受,但只去除 <script>标签。有什么可能的方法吗?

我并不是说有人为我编写代码,而是非常感谢您提供有关如何实现此目的(如果可能)的可能方法。

最佳答案

编辑

使用 HTML 净化器 HTML.ForbiddenElements配置指令,看起来你会做这样的事情:

require_once '/path/to/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.ForbiddenElements', array('script','style','applet'));
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);

http://htmlpurifier.org/docs

HTML.ForbiddenElements should be set to an array .我不知道的是什么形式 array成员应该采取:

array('script','style','applet')

或者:

array('<script>','<style>','<applet>')

或者……别的东西?

认为这是第一种形式,没有分隔符; HTML.AllowedElements 使用一种与 TinyMCE's valid elements syntax 有点通用的配置字符串形式:

tinyMCE.init({
...
valid_elements : "a[href|target=_blank],strong/b,div[align],br",
...
});

所以我的猜测是它只是术语,不应提供任何属性(因为您禁止该元素...尽管也有一个 HTML.ForbiddenAttributes )。但这是一个猜测。

我将从 HTML.ForbiddenAttributes 添加这条注释文档,以及:

Warning: This directive complements %HTML.ForbiddenElements, accordingly, check out that directive for a discussion of why you should think twice before using this directive.

黑名单并不像白名单那样“稳健”,但您可能有自己的理由。小心点,小心点。

没有测试,我不知道该告诉你什么。我会继续寻找答案,但我可能会先去 sleep 。已经很晚了。 :)


虽然我认为你真的应该使用 HTML Purifier并利用它的 HTML.ForbiddenElements 配置指令,如果你真的非常想使用 strip_tags(),我认为这是一个合理的选择就是从黑名单中推导出白名单。换句话说,删除不需要的,然后使用剩下的。

例如:

function blacklistElements($blacklisted = '', &$errors = array()) {
if ((string)$blacklisted == '') {
$errors[] = 'Empty string.';
return array();
}

$html5 = array(
"<menu>","<command>","<summary>","<details>","<meter>","<progress>",
"<output>","<keygen>","<textarea>","<option>","<optgroup>","<datalist>",
"<select>","<button>","<input>","<label>","<legend>","<fieldset>","<form>",
"<th>","<td>","<tr>","<tfoot>","<thead>","<tbody>","<col>","<colgroup>",
"<caption>","<table>","<math>","<svg>","<area>","<map>","<canvas>","<track>",
"<source>","<audio>","<video>","<param>","<object>","<embed>","<iframe>",
"<img>","<del>","<ins>","<wbr>","<br>","<span>","<bdo>","<bdi>","<rp>","<rt>",
"<ruby>","<mark>","<u>","<b>","<i>","<sup>","<sub>","<kbd>","<samp>","<var>",
"<code>","<time>","<data>","<abbr>","<dfn>","<q>","<cite>","<s>","<small>",
"<strong>","<em>","<a>","<div>","<figcaption>","<figure>","<dd>","<dt>",
"<dl>","<li>","<ul>","<ol>","<blockquote>","<pre>","<hr>","<p>","<address>",
"<footer>","<header>","<hgroup>","<aside>","<article>","<nav>","<section>",
"<body>","<noscript>","<script>","<style>","<meta>","<link>","<base>",
"<title>","<head>","<html>"
);

$list = trim(strtolower($blacklisted));
$list = preg_replace('/[^a-z ]/i', '', $list);
$list = '<' . str_replace(' ', '> <', $list) . '>';
$list = array_map('trim', explode(' ', $list));

return array_diff($html5, $list);
}

然后运行它:

$blacklisted = '<html> <bogus> <EM> em li ol';
$whitelist = blacklistElements($blacklisted);

if (count($errors)) {
echo "There were errors.\n";
print_r($errors);
echo "\n";
} else {
// Do strip_tags() ...
}

http://codepad.org/LV8ckRjd

因此,如果您传入您不想允许的内容,它会在 array 中返回给您 HTML5 元素列表。然后您可以输入 strip_tags() 的表格将其连接成一个字符串后:

$stripped = strip_tags($html, implode('', $whitelist)));

买者自负

现在,我已经解决了这个问题,我知道还有一些问题我还没有考虑到。例如,来自 strip_tags() man page对于 $allowable_tags参数:

Note:

This parameter should not contain whitespace. strip_tags() sees a tag as a case-insensitive string between < and the first whitespace or >. It means that strip_tags("<br/>", "<br>") returns an empty string.

已经晚了,出于某种原因,我不太明白这对这种方法意味着什么。所以明天我得考虑一下。我还在函数的 $html5 中编译了 HTML 元素列表。来自这个 MDN documentation page 的元素.眼尖的读者可能会注意到所有的标签都是这种形式:

<tagName>

我不确定这将如何影响结果,我是否需要考虑使用 shorttag <tagName/> 的变化还有一些,咳咳,奇怪的变化。当然,还有 more tags out there .

所以它可能还没有准备好生产。但你明白了。

关于php - strip_tags 不允许某些标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12362426/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com