gpt4 book ai didi

php curl ,链接标签提取

转载 作者:行者123 更新时间:2023-12-04 06:50:51 27 4
gpt4 key购买 nike

我有提取链接的代码,但我也需要链接标签。我需要将链接存储在一个数组中,并将链接标签存储在另一个数组中。

例如,如果站点 bbc.com 的代码为 <a href="bbc.com/sports.html>sports</a> , 我需要 $linklabel[0]=sports$link[0]=bbc.com/sports.html .

代码如下,但出现错误 Fatal error: Call to undefined method DOMXPath::find() in C:\wamp\www\test\d.php on line 14

<?php
$url='http://edition.cnn.com/?fbid=4OofUbASN5k';

$var = fread_url($url);// function calling to get the page from curl
$search = array('@<script[^>]*?>.*?</script>@si'); // Strip out javascript
$var = preg_replace($search, "\n", html_entity_decode($var)); // Strip out javascript

$linklabel = array();
$link = array();
$dom = new DOMDocument($var);
@$dom->loadHTML($var);
$xpath = new DOMXPath($dom);// Grab the DOM nodes

foreach($xpath->find('a') as $element)
{
array_push($linklabel, $element->innerText);
print $linklabel;
array_push($link, $element->href);
print $link.'<br>';
}


function fread_url($url)
{
if(function_exists("curl_init")){
$ch = curl_init();
$user_agent = "Mozilla/4.0 (compatible; MSIE 5.01; ".
"Windows NT 5.0)";
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt( $ch, CURLOPT_HTTPGET, 1 );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION , 1 );
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION , 1 );
curl_setopt( $ch, CURLOPT_URL, $url );

curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
$html = curl_exec($ch);
//print $html;//printing the web page.
curl_close($ch);
}
else{
$hfile = fopen($url,"r");
if($hfile){
while(!feof($hfile)){
$html.=fgets($hfile,1024);
}
}
}
return $html;
}

?>

最佳答案

这很容易使用 Simple HTML DOM.

$html = file_get_html('http://www.google.com/');

$linklabel = array();
$link = array();

foreach($html->find('a') as $element)
{
array_push($linklabel, $element->innerText);
array_push($link, $element->href);
}

关于php curl ,链接标签提取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3162445/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com