gpt4 book ai didi

php - 具有三个for循环的php中的内存泄漏

转载 作者:搜寻专家 更新时间:2023-10-31 20:50:05 24 4
gpt4 key购买 nike

我的脚本是一个蜘蛛,它检查页面是“链接页面”还是“信息页面”。如果该页面是一个“链接页面”,那么它将以递归方式继续(或者如果你愿意的话,可以是一棵树)直到找到“信息页”。

我试图使脚本递归,这很容易,但我一直收到错误:

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 39 bytes) in /srv/www/loko/simple_html_dom.php on line 1316

我被告知我必须使用 for 循环方法 因为无论我是否使用 unset() 函数脚本都不会释放内存而且我只需要循环三个级别通过所以它是有道理的。但在我更改脚本后,错误再次发生,但也许我可以释放现在内存?

有些东西需要死在这里,请帮我毁掉一个人!

set_time_limit(0);
ini_set('memory_limit', '256M');
require("simple_html_dom.php");
$thelink = "http://www.somelink.com";
$html1 = file_get_html($thelink);
$ret1 = $html1->find('#idTabResults2');

// first inception level, we know page has only links
if (!$ret1){
$es1 = $html1->find('table.litab a');
//unset($html1);
$countlinks1 = 0;
foreach ($es1 as $aa1) {
$links1[$countlinks1] = $aa1->href;
$countlinks1++;
}
//unset($es1);

//for every link in array do the same
for ($i = 0; $i < $countlinks1; $i++) {
$html2 = file_get_html($links1[$i]);
$ret2 = $html2->find('#idTabResults2');
// if got information then send to DB
if ($ret2){
pullInfo($html2);
//unset($html2);
} else {
// continue inception
$es2 = $html2->find('table.litab a');
$html2 = null;

$countlinks2 = 0;
foreach ($es2 as $aa2) {
$links2[$countlinks2] = $aa2->href;
$countlinks2++;
}
//unset($es2);

for ($j = 0; $j < $countlinks2; $j++) {
$html3 = file_get_html($links2[$j]);
$ret3 = $html3->find('#idTabResults2');
// if got information then send to DB
if ($ret3){
pullInfo($html3);

} else {
// inception level three
$es3 = $html3->find('table.litab a');
$html3 = null;
$countlinks3 = 0;
foreach ($es3 as $aa3) {
$links3[$countlinks3] = $aa3->href;
$countlinks3++;
}
for ($k = 0; $k < $countlinks3; $k++) {
echo memory_get_usage() ;
echo "\n";
$html4 = file_get_html($links3[$k]);
$ret4 = $html4->find('#idTabResults2');
// if got information then send to DB
if ($ret4){
pullInfo($html4);

}
unset($html4);
}
unset($html3);
}

}
}
}
}



function pullInfo($html)
{

$tds = $html->find('td');
$count =0;
foreach ($tds as $td) {
$count++;
if ($count==1){
$name = html_entity_decode($td->innertext);
}
if ($count==2){
$address = addslashes(html_entity_decode($td->innertext));
}
if ($count==3){
$number = addslashes(preg_replace('/(\d+) - (\d+)/i', '$2$1', $td->innertext));
}

}
unset($tds, $td);

$name = mysql_real_escape_string($name);
$address = mysql_real_escape_string($address);
$number = mysql_real_escape_string($number);
$inAlready=mysql_query("SELECT * FROM people WHERE phone=$number");
while($e=mysql_fetch_assoc($inAlready))
$output[]=$e;
if (json_encode($output) != "null"){
//print(json_encode($output));
} else {

mysql_query("INSERT INTO people (name, area, phone)
VALUES ('$name', '$address', '$number')");
}
}

这是内存大小增长的图片: enter image description here

最佳答案

我稍微修改了代码以释放我认为可以释放的内存。我在每个修改上方添加了评论。添加的注释以“#”开头,以便您更容易找到它们。这与这个问题无关,但值得一提的是,您的数据库插入代码容易受到 SQL 注入(inject)的攻击。<​​/p>

<?php
require("simple_html_dom.php");
$thelink = "http://www.somelink.co.uk";

# do not keep raw contents of the file on memory
#$data1 = file_get_contents($thelink);
#$html1 = str_get_html($data1);
$html1 = str_get_html(file_get_contents($thelink));

$ret1 = $html1->find('#idResults2');

// first inception level, we know page has only links
if (!$ret1){
$es1 = $html1->find('table.litab a');

# free $html1, not used anymore
unset($html1);

$countlinks1 = 0;
foreach ($es1 as $aa1) {
$links1[$countlinks1] = $aa1->href;
$countlinks1++;
// echo (addslashes($aa->href));
}

# free memroy used by the $es1 value, not used anymore
unset($es1);

//for every link in array do the same

for ($i = 0; $i <= $countlinks1; $i++) {
# do not keep raw contents of the file on memory
#$data2 = file_get_contents($links1[$i]);
#$html2 = str_get_html($data2);
$html2 = str_get_html(file_get_contents($links1[$i]));

$ret2 = $html2->find('#idResults2');

// if got information then send to DB
if ($ret2){
pullInfo($html2);
} else {
// continue inception

$es2 = $html2->find('table.litab a');

# free memory used by $html2, not used anymore.
# we would unset it at the end of the loop.
$html2 = null;

$countlinks2 = 0;
foreach ($es2 as $aa2) {
$links2[$countlinks2] = $aa2->href;
$countlinks2++;
}

# free memory used by $es2
unest($es2);

for ($j = 0; $j <= $countlinks2; $j++) {
# do not keep raw contents of the file on memory
#$data3 = file_get_contents($links2[$j]);
#$html3 = str_get_html($data3);
$html3 = str_get_html(file_get_contents($links2[$j]));
$ret3 = $html3->find('#idResults2');
// if got information then send to DB
if ($ret3){
pullInfo($html3);
}

# free memory used by $html3 or on last iteration the memeory would net get free
unset($html3);
}
}

# free memory used by $html2 or on last iteration the memeory would net get free
unset($html2);
}
}



function pullInfo($html)
{
$tds = $html->find('td');
$count =0;
foreach ($tds as $td) {
$count++;
if ($count==1){
$name = addslashes($td->innertext);
}
if ($count==2){
$address = addslashes($td->innertext);
}
if ($count==3){
$number = addslashes(preg_replace('/(\d+) - (\d+)/i', '$2$1', $td->innertext));
}

}

# check for available data:
if ($count) {
# free $tds and $td
unset($tds, $td);

mysql_query("INSERT INTO people (name, area, phone)
VALUES ('$name', '$address', '$number')");
}

}

更新:

您可以跟踪您的内存使用情况,以查看代码的每个部分使用了多少内存。这可以通过使用 memory_get_usage() 调用并将结果保存到某个文件来完成。比如将下面的代码放在每个循环的末尾,或者在创建对象之前调用大量方法:

file_put_contents('memory.log', 'memory used in line ' . __LINE__ . ' is: ' . memory_get_usage() . PHP_EOL, FILE_APPEND);

因此您可以跟踪代码每个部分的内存使用情况。

最后请记住,所有这些跟踪和优化可能还不够,因为您的应用程序可能确实需要超过 32 MB 的内存。我开发了一个系统,可以分析多个数据源并检测垃圾邮件发送者,然后阻止他们的 SMTP 连接,因为有时连接的用户数量超过 30000,经过大量代码优化,我不得不将 PHP 内存限制增加到 768 MB 在服务器上,这不是常见的事情。

关于php - 具有三个for循环的php中的内存泄漏,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8632861/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com