gpt4 book ai didi

javascript - 仅从网页中提取文本内容

转载 作者:太空宇宙 更新时间:2023-11-04 16:14:44 26 4
gpt4 key购买 nike

我需要从网页中提取所有文本内容。我使用了“document.body.textContent”。但是我也得到了 javascript 内容。如何确保我只得到可读的文本内容?

function myFunction() {
var str = document.body.textContent
alert(str);
}
<html>
<title>Test Page for Text extraction</title>

<head>I hope this works</head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>

<body>
<p>Test on this content to change the 5th word to a link
<p>
<button onclick="myFunction()">Try it</button>
</body>
</hmtl>

最佳答案

在执行 body.textContent 之前,只需删除您不想阅读的标签即可。

function myFunction() {
var bodyScripts = document.querySelectorAll("body script");
for(var i=0; i<bodyScripts.length; i++){
bodyScripts[i].remove();
}
var str = document.body.textContent;
document.body.innerHTML = '<pre>'+str+'</pre>';
}
<html>
<title>Test Page for Text extraction</title>

<head>I hope this works</head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>

<body>
<p>Test on this content to change the 5th word to a link
<p>
<button onclick="myFunction()">Try it</button>
</body>
</hmtl>

关于javascript - 仅从网页中提取文本内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32825924/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com