gpt4 book ai didi

php - 用绝对 URL 替换所有相对 URL

转载 作者:行者123 更新时间:2023-12-04 14:41:27 25 4
gpt4 key购买 nike

我看过一些答案(如 this one ),但我有一些更复杂的场景,我不确定如何解释。

我基本上有完整的 HTML 文档。我需要用绝对 URL 替换每个相对 URL。

来自潜在 HTML 的元素如下所示,也可能是其他情况:

<img src="/relative/url/img.jpg" />
<form action="/">
<form action="/contact-us/">
<a href='/relative/url/'>Note the Single Quote</a>
<img src="//example.com/protocol-relative-img.jpg" />

期望的输出是:
// "//example.com/" is ideal, but "http(s)://example.com/" are acceptable

<img src="//example.com/relative/url/img.jpg" />
<form action="//example.com/">
<form action="//example.com/contact-us/">
<a href='//example.com/relative/url/'>Note the Single Quote</a>
<img src="//example.com/protocol-relative-img.jpg" /> <!-- Unmodified -->

我不想替换协议(protocol)相对 URL,因为它们已经起到绝对 URL 的作用。我想出了一些有效的代码,但我想知道我是否可以稍微清理一下,因为它是 重复。

但我必须考虑 src 的单引号和双引号属性值, href , 和 action (我是否缺少任何可以具有相对 URL 的属性?)同时避免协议(protocol)相对 URL。

这是我到目前为止所拥有的:
// Make URL replacement protocol relative to not break insecure/secure links
$url = str_replace( array( 'http://', 'https://' ), '//', $url );

// Temporarily Modify Protocol-Relative URLS
$str = str_replace( 'src="//', 'src="::TEMP_REPLACE::', $str );
$str = str_replace( "src='//", "src='::TEMP_REPLACE::", $str );
$str = str_replace( 'href="//', 'href="::TEMP_REPLACE::', $str );
$str = str_replace( "href='//", "href='::TEMP_REPLACE::", $str );
$str = str_replace( 'action="//', 'action="::TEMP_REPLACE::', $str );
$str = str_replace( "action='//", "action='::TEMP_REPLACE::", $str );

// Replace all other Relative URLS
$str = str_replace( 'src="/', 'src="'. $url .'/', $str );
$str = str_replace( "src='/", "src='". $url ."/", $str );
$str = str_replace( 'href="/', 'href="'. $url .'/', $str );
$str = str_replace( "href='/", "href='". $url ."/", $str );
$str = str_replace( 'action="/', 'action="'. $url .'/', $str );
$str = str_replace( "action='/", "action='". $url ."/", $str );

// Change Protocol Relative URLs back
$str = str_replace( 'src="::TEMP_REPLACE::', 'src="//', $str );
$str = str_replace( "src='::TEMP_REPLACE::", "src='//", $str );
$str = str_replace( 'href="::TEMP_REPLACE::', 'href="//', $str );
$str = str_replace( "href='::TEMP_REPLACE::", "href='//", $str );
$str = str_replace( 'action="::TEMP_REPLACE::', 'action="//', $str );
$str = str_replace( "action='::TEMP_REPLACE::", "action='//", $str );

我的意思是,它有效,但它很丑,我想可能有更好的方法来做到这一点。

最佳答案

新答案

如果您的真实 html 文档是有效的(并且有父/包含标签),那么最合适和可靠的技术将是使用适当的 DOM 解析器。

以下是如何使用 DOMDocument 和 Xpath 优雅地定位和替换您指定的标签属性:

代码 1 - 嵌套 Xpath 查询:( Demo )

$domain = '//example.com';
$tagsAndAttributes = [
'img' => 'src',
'form' => 'action',
'a' => 'href'
];

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($tagsAndAttributes as $tag => $attr) {
foreach ($xpath->query("//{$tag}[not(starts-with(@{$attr}, '//'))]") as $node) {
$node->setAttribute($attr, $domain . $node->getAttribute($attr));
}
}
echo $dom->saveHTML();

代码 2 - 带条件块的单个 Xpath 查询:( Demo)
$domain = '//example.com';
$targets = [
"//img[not(starts-with(@src, '//'))]",
"//form[not(starts-with(@action, '//'))]",
"//a[not(starts-with(@href, '//'))]"
];

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query(implode('|', $targets)) as $node) {
if ($src = $node->getAttribute('src')) {
$node->setAttribute('src', $domain . $src);
} elseif ($action = $node->getAttribute('action')) {
$node->setAttribute('action', $domain . $action);
} else {
$node->setAttribute('href', $domain . $node->getAttribute('href'));
}
}
echo $dom->saveHTML();

旧答案:(...regex 不是“DOM 感知”并且容易受到意外破坏)

如果我理解正确的话,你心里就有一个基本值,你只想把它应用到相对路径上。

Pattern Demo

代码:( Demo )
$html=<<<HTML
<img src="/relative/url/img.jpg" />
<form action="/">
<a href='/relative/url/'>Note the Single Quote</a>
<img src="//site.com/protocol-relative-img.jpg" />
HTML;

$base='https://example.com';

echo preg_replace('~(?:src|action|href)=[\'"]\K/(?!/)[^\'"]*~',"$base$0",$html);

输出:
<img src="https://example.com/relative/url/img.jpg" />
<form action="https://example.com/">
<a href='https://example.com/relative/url/'>Note the Single Quote</a>
<img src="//site.com/protocol-relative-img.jpg" />

模式分解:
~                      #Pattern delimiter
(?:src|action|href) #Match: src or action or href
= #Match equal sign
[\'"] #Match single or double quote
\K #Restart fullstring match (discard previously matched characters
/ #Match slash
(?!/) #Negative lookahead (zero-length assertion): must not be a slash immediately after first matched slash
[^\'"]* #Match zero or more non-single/double quote characters
~ #Pattern delimiter

关于php - 用绝对 URL 替换所有相对 URL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48836281/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com