gpt4 book ai didi

php - 有没有办法在一个脚本中的一系列网页上使用 file_get_html ?

转载 作者:行者123 更新时间:2023-11-29 20:49:17 24 4
gpt4 key购买 nike

我有一个充满 MySQL 表中链接的数据库,我想解析每个链接并将输出移动到一个新文件。不幸的是,每当我在 while 循环内使用 include("getContent.php") 时,我都会收到错误“无法重新声明 file_get_html()”。

我的主要脚本是

<?php
$db = 'newsfeed';
$zeta = 0;
$beta = 0;
// connect to RDS instance MySQL Database Newsfeed
include_once('/var/www/dbfunctions/mysqli_connectdb.php');

// set content source table
$sourcetable = 'feedsources';
$mastertable = 'mastertable';

// set date to remove results older than
date_default_timezone_set("UTC");
$datenow = date_timestamp_get(date_create());
$offset = "86400";
$deldate = $datenow - $offset;

//begin cycling through content data

//delete all "old" entries from the mastertable

//get number of source items present
$itemquery = "SELECT id,name FROM $sourcetable";
$itemresult = mysqli_query($conn, $itemquery);

while ($row = mysqli_fetch_assoc($itemresult)) {
$sourceid = $row['id'];
$sourcename = $row['name'];

// cycle throught the data tables
$dataquery = "SELECT * FROM $sourcetable WHERE id = $sourceid;";

$dataresult = mysqli_query($conn, $dataquery);

while ($row = mysqli_fetch_assoc($dataresult)) {
$table = $row['datatable'];
}

// copy all data from the targetted table into the master table


//loop through the targetted table and copy to mysql
$getdata = "SELECT * FROM ".$table.";";
$datareturn = mysqli_query($conn, $getdata);

while ($row = mysqli_fetch_assoc($datareturn)) {
$date = $row['datecreated'];
$title = addslashes($row['title']);
$url = addslashes($row['url']);
$tags = addslashes($row['tags']);
$titleid = $row['id'];

//get content and place in html file in /var/www/html/nuzr/content/

include("getcontent.php");
echo $filename;
//check whether the item already exists in the table
$checkquery = "select id from ".$mastertable." where title = '".$title."';";

$checkcheck = mysqli_query($conn, $checkquery);

if(mysqli_num_rows($checkcheck) > 0){
echo "CHECKFAILED";
}else{

$copy = "INSERT INTO ".$mastertable." VALUES ('NULL','$table','$sourcename','$date','$title','$url','$tags','$filename');";
mysqli_query($conn, $copy);

echo "Beta is ".$beta;
$beta = $beta + 1;
}
}
// clean the master table
$delquery = 'DELETE FROM '.$mastertable.' WHERE datecreated < '.$deldate.';';

mysqli_query($conn, $delquery);

}

function clear()
{
$this->dom = null;
$this->parent = null;
$this->parent = null;
$this->children = null;
}

?>

getcontent.php 脚本是

<?php
//Check Start
//echo "Program Starts";

// Include the library
include('/var/www/tools/dom/simple_html_dom.php');

$source = $url;
$content = array();
$header1 = array();
$header2 = array();
$i = 0; $y = 0;

// Retrieve the DOM from a given URL
$html = file_get_html($source);

//grab headers in case initial title is a header
foreach($html->find('h1') as $e){

$header1[$i] = $e->outertext;

//echo $e->outertext;

$i = $i + 1;
}

$i = 0;

foreach($html->find('h2') as $e){

$header2[$i] = $e->outertext;

//echo $e->outertext;

$i = $i + 1;
}

//reset counter
$i = 0;

// Find all paragraph tags and print their content into a text file
foreach($html->find('p') as $e){

$content[$i] = $e->outertext;

//echo $e->outertext;

$i = $i + 1;
}

//create the content storage file
$filename = "/var/www/html/nuzr/content/".$table.$titleid.".html";
echo "The filename is".$filename;
$file = fopen($filename,"a");

// write header and link to original article
$titleblurb = "<b>Original article courtesy of <a href='".$url."'>".$sourcename."</a></b>";
fwrite($file, $titleblurb);

// set site specific parameters based on header / footer size
if($sourcename == "The Globe and Mail"){

//Set indexing parameters
$z = $i - 13; $y = 2;

//Add Header content
$text = $header1[0];
fwrite($file, $text);
$text = $header2[1];
fwrite($file, $text);

}elseif($sourcename == "CNN Money"){

//Set indexing parameters
$z = $i - 3; $y = 1;

//Add header content
$text = $header1[0];
fwrite($file, $text);
$text = $header2[1];
fwrite($file, $text);

}elseif($sourcename == "CNN Markets"){

//Set indexing parameters
$z = $i - 3; $y = 1;

//Add header content
$text = $header1[0];
fwrite($file, $text);
//$text = $header2[1];
//fwrite($file, $text);

}elseif($sourcename == "BBC Business"){

//Set indexing parameters
$z = $i - 9; $y = 1;

//Add header content
$text = $header1[0];
fwrite($file, $text);
//$text = $header2[1];
//fwrite($file, $text);

}elseif($sourcename == "BBC Politics"){

//Set indexing parameters
$z = $i - 0; $y = 1;

//Add header content
$text = $header1[0];
fwrite($file, $text);
//$text = $header2[1];
//fwrite($file, $text);

}else{
echo $sourcename;
}

do{

$text = $content[$y];
fwrite($file, $text);
$y = $y +1;

}while($y<$z);

echo "Zeta is".$zeta;
$zeta = $zeta +1;

//close the content file
fclose($file);

//echo "File end.";

$html->clear();
unset($html);





?>

对于有点困惑的程序表示歉意。在解决问题的过程中,我添加了很多计数器和东西。

如有任何建议,我们将不胜感激。目前该程序遇到 fatal error 并且无法运行。我见过一些情况,人们遇到类似的问题,建议他们使用 include_once() 而不是 include(),但这不起作用,因为这意味着您只能解析其中一个目标 URL。

最佳答案

您可以将此函数放在单独的文件中,或将其粘贴到

if (!method_exists('file_get_html')) {
function file_get_html() {}
}

关于php - 有没有办法在一个脚本中的一系列网页上使用 file_get_html ?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38190130/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com