gpt4 book ai didi

php - 使用 cURL 从网页获取 html 并使用 preg-replace 去除 html

转载 作者:可可西里 更新时间:2023-11-01 00:35:09 26 4
gpt4 key购买 nike

我想获取海盗湾的统计数据,统计数据可以在 TPB 上的以下 div 中找到:

<div id="stats">5.695.184 registered users Last updated 14:46:05.<br />35.339.741 peers (25.796.820 seeders + 9.542.921 leechers) in 4.549.473 torrents.<br />    </div>

这是我的代码:

<?php
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL,"http://thepiratebay.se");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
curl_setopt($ch,CURLOPT_COOKIE,"language=nl_NL; c[thepiratebay.se][/][language]=nl_NL");
$data=curl_exec($ch);
$data = preg_replace('/(.*?)(<div id="stats">)(.*?)(<\/div>)(.*?)/','$2',$data);
echo $data;
curl_close($ch);
exit;
?>

如您所见,我使用以下 preg-replace 模式来去除 HTML:

$data = preg_replace('/(.*?)(<div id="stats">)(.*?)(<\/div>)(.*?)/','$2',$data);

但这不起作用。我得到了 TPB 的整个页面,而不仅仅是统计数据。有人有答案吗?

提前致谢。

最佳答案

忘记使用正则表达式进行屏幕抓取,使用 domDocument相反,看看它是多么简单:

<?php 
function curl_get($url){
$useragent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,5);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch,CURLOPT_COOKIE,"language=nl_NL; c[thepiratebay.se][/][language]=nl_NL");
$data=curl_exec($ch);
curl_close($ch);
return $data;
}

function get_pb_stats(){
$html = curl_get("http://thepiratebay.se");
// Create a new DOM Document
$xml = new DOMDocument();

// Load the html contents into the DOM
@$xml->loadHTML($html);

$return = trim($xml->getElementById('stats')->nodeValue);
//regex to add the brake tag after 15:04:05.
$return = preg_replace('/\d{2}[:]\d{2}[:]\d{2}[.]/','${0}<br />',$return);
return $return;
}

echo get_pb_stats();

/*
5.695.213 geregistreerde gebruikers Laatste update 15:04:05.<br />35.505.322 peers (25.948.185 seeders + 9.557.137 leechers) in 4.546.560 torrents.
*/
?>

关于php - 使用 cURL 从网页获取 html 并使用 preg-replace 去除 html,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10449200/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com