gpt4 book ai didi

javascript正则表达式从 anchor 标签中提取 anchor 文本和URL

转载 作者:可可西里 更新时间:2023-11-01 01:35:11 24 4
gpt4 key购买 nike

我在名为“input_content”的 javascript 变量中有一段文本,该文本包含多个 anchor 标记/链接。我想匹配所有 anchor 标记并提取 anchor 文本和 URL,并将其放入类似(或类似)的数组中:

Array(    [0] => Array        (            [0] => <a href="http://yahoo.com">Yahoo</a>            [1] => http://yahoo.com            [2] => Yahoo        )    [1] => Array        (            [0] => <a href="http://google.com">Google</a>            [1] => http://google.com            [2] => Google        ))

我已经破解了它 ( http://pastie.org/339755 ),但我对这一点感到困惑。感谢您的帮助!

最佳答案

var matches = [];

input_content.replace(/[^<]*(<a href="([^"]+)">([^<]+)<\/a>)/g, function () {
matches.push(Array.prototype.slice.call(arguments, 1, 4))
});

这假设您的 anchor 将始终采用 <a href="...">...</a> 的形式也就是说,如果有任何其他属性(例如 target ),它将不起作用。可以改进正则表达式以适应这一点。

分解正则表达式:

/ -> start regular expression  [^<]* -> skip all characters until the first <  ( -> start capturing first token    <a href=" -> capture first bit of anchor    ( -> start capturing second token        [^"]+ -> capture all characters until a "    ) -> end capturing second token    "> -> capture more of the anchor    ( -> start capturing third token        [^<]+ -> capture all characters until a <    ) -> end capturing third token    <\/a> -> capture last bit of anchor  ) -> end capturing first token/g -> end regular expression, add global flag to match all anchors in string

Each call to our anonymous function will receive three tokens as the second, third and fourth arguments, namely arguments[1], arguments[2], arguments[3]:

  • arguments[1] is the entire anchor
  • arguments[2] is the href part
  • arguments[3] is the text inside

We'll use a hack to push these three arguments as a new array into our main matches array. The arguments built-in variable is not a true JavaScript Array, so we'll have to apply the split Array method on it to extract the items we want:

Array.prototype.slice.call(arguments, 1, 4)

这将从 arguments 中提取项目从索引 1 开始到索引 4 结束(不包括在内)。

var input_content = "blah \
<a href=\"http://yahoo.com\">Yahoo</a> \
blah \
<a href=\"http://google.com\">Google</a> \
blah";

var matches = [];

input_content.replace(/[^<]*(<a href="([^"]+)">([^<]+)<\/a>)/g, function () {
matches.push(Array.prototype.slice.call(arguments, 1, 4));
});

alert(matches.join("\n"));

给予:

<a href="http://yahoo.com">Yahoo</a>,http://yahoo.com,Yahoo<a href="http://google.com">Google</a>,http://google.com,Google

关于javascript正则表达式从 anchor 标签中提取 anchor 文本和URL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/369147/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com