gpt4 book ai didi

java - 使用 jsoup 提取没有 img 标签的图像

转载 作者:太空宇宙 更新时间:2023-11-04 10:17:46 25 4
gpt4 key购买 nike

我需要提取 div 内的图像,并且 src 未在 img 标记内列出。我也无法执行 getElementById() ,因为它因页面而异。在这种情况下,我可以使用一些正则表达式从文档中提取图像吗?任何帮助表示赞赏。

HTML 片段:

<div 
class="rendition-bg rendition-bg--alignment desktop-center-center mobile-center-center"
data-src="/content/dam/Image.jpg.transform/default-
mobile/image.jpg"
data-mobile-rendition="/content/dam/Image.jpg.transform/default-mobile/image.jpg"
data-tablet-rendition="/content/dam/Image.jpg.transform/default-mobile/image.jpg"
data-desktop- rendition="/content/dam/Image.jpg.transform/default-desktop/image.jpg"
style="background-image: url(&quot;/content/dam/Image.jpg.transform/default-
mobile/image.jpg&quot;);">
</div>

最佳答案

远非优雅或简单的解决方案,但希望以下内容可以给您一些开始:

    String snippet =
"<div class=\"rendition-bg rendition-bg--alignment desktop-center-center" +
"mobile-center-center \" data-src=\"/content/dam/Image.jpg.transform/default-" +
"mobile/image.jpg\" data-mobile- \n" +
"rendition=\"/content/dam/Image.jpg.transform/default-mobile/image.jpg\" data-" +
"tablet-rendition=\"/content/dam/Image.jpg.transform/default-mobile/image.jpg\"" +
"data-desktop- rendition=\"/content/dam/Image.jpg.transform/default-desktop/image.jpg\"" +
"style=\"background-image: url(&quot;/content/dam/Image.jpg.transform/default-" +
"mobile/image.jpg&quot;);\"></div>";

List<String> imgAttrs =
Jsoup.parse(snippet)
.getElementsByTag("div")
.stream()
// get lists of attributes
.map(Element::attributes)
// flatten all attrs to single list
.flatMap(attrs -> attrs.asList().stream())
// filter attributes
.filter(attribute -> attribute.getValue() != null && attribute.getValue().contains(".jpg"))
// map to values
.map(Attribute::getValue)
// replace all ".transform" with a whitespace
.map(attrValue -> attrValue.replace(".transform", " "))
// get url value of a "background-image"
.map(attrValue -> getUrlFromBackgroundImage(attrValue))
// split attributes by whitespaces
.flatMap(attrValue -> Stream.of(attrValue.split(" ")))
.collect(toList());
}

private static String getUrlFromBackgroundImage(final String backgroundImage) {
Pattern pattern = Pattern.compile("background-image:[ ]?url\\((['\"]?(.*?\\.(?:png|jpg|jpeg|gif)(\\s)?)*)");
Matcher matcher = pattern.matcher(backgroundImage);
return matcher.find() ? matcher.group(1) : backgroundImage;
}

imgAttrs 的内容应该是:

/content/dam/Image.jpg
/default-mobile/image.jpg
/content/dam/Image.jpg
/default-desktop/image.jpg
/content/dam/Image.jpg
/default-mobile/image.jpg
"/content/dam/Image.jpg
/default-mobile/image.jpg

不确定这是否是您所需要的。

关于java - 使用 jsoup 提取没有 img 标签的图像,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51484836/

25 4 0