gpt4 book ai didi

PHP OpenDir 与 array_rand

转载 作者:搜寻专家 更新时间:2023-10-31 20:59:34 24 4
gpt4 key购买 nike

我有一个图片目录,它可以包含 100 到数千张图片。我需要从该目录中取出 81 张随机图像作为样本以供使用(在一个数组中)。

我目前正在使用以下工具来抓取图像

$locations = 'compressed/';
$images = glob($locations . '*', GLOB_BRACE);
$selected = $images[array_rand($images)];

此方法的问题是可能会两次获得相同的图像(尽管在大样本中很少见)

我还看到可以使用 opendir 然后对数组进行洗牌。有人可以告诉我哪个使用效率更高吗?我会假设使用 shuffle 然后抓取前 81 个元素会更好但对于更大的计数会更慢(因为洗牌大型数组需要更长的时间)。

关于与使用 opendir(或我可能不知道的其他方法)相对的我当前设置的时间复杂度的任何建议?

谢谢

最佳答案

这是一个非常好的问题,我希望能提出更多这样的问题。


$start = microtime(true);

function recursiveDirectoryIterator($path) {
foreach(new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path)) as $file) {
if(!$file->isDir()) {
yield $file->getFilename() . $file->getExtension();
}
}
}

$instance = recursiveDirectoryIterator('../vendor');
$files = [];
foreach($instance as $value) {
$files[] = $value;
}

$total_files = count($files);
$random_array = [];
$total_randoms = 81;
for(;;){
$rand = random_int(0, $total_files);
if(count($random_array) == $total_randoms) {
break;
}
if(!isset($random_array[$rand])) {
$random_array[$rand] = $files[$rand];
}
}

echo "Mem peak usage: " . (memory_get_peak_usage(true)/1024/1024)." MiB" . '<br>';
echo "Total number of files: " . $total_files . '<br>';
echo "Completed in: ", microtime(true) - $start, " seconds" . '<br>';
echo '<pre>';
print_r($final);
die;

输出

Mem peak usage: 2 MiB
Total number of files: 12972
Completed in: 0.74663186073303 seconds
Array
(
[6118] => PreDec.phpphp
[4560] => LabelMaker.phpphp
[10360] => RecursiveDirectoryIterator.phpphp
[4124] => Enum.phpphp
[2671] => ImportCommand.phpphp
[1250] => WebDriverTest.phpphp
[10518] => AutoExpireFlashBagTest.phpphp
[6805] => zsdtPackTask.phpphp
[4288] => HTML.Trusted.txttxt
[6462] => border-disable.phptphpt
[4980] => main.ymlyml
[505] => StepTested.phpphp
[5219] => xhprof.ini.j2j2
[12959] => RequestInterface.phpphp
[1423] => xd5.phpphp
[4285] => HTML.TidyAdd.txttxt
[4930] => .travis.ymlyml
[12013] => Defined.phpphp
[8779] => Markdown.phpphp
[5979] => pt.phpphp
[278] => AbstractAdapter.phpphp
[2155] => SemVerTest.phpphp
[523] => ServicesResolverFactory.phpphp
[11686] => AbstractDumper.phpphp
[7320] => Functions.phpphp
[7763] => mocked_clone.tpl.distdist
[11541] => test_landscape.gifgif
[3557] => RegionSelectorSpec.phpphp
[2600] => RoutingAccessSniff.phpphp
[9496] => LoaderTest.phpphp
[4958] => setup-RedHat.ymlyml
[3477] => api.featurefeature
[7975] => WtfCommand.phpphp
[9001] => ElseIfDeclarationSniff.phpphp
[11696] => VarDumperTestTrait.phpphp
[11211] => empty.ymlyml
[10925] => ObjectRouteLoader.phpphp
[10936] => MatcherDumperInterface.phpphp
[2685] => ConnectCommand.phpphp
[9066] => EmptyStyleDefinitionSniff.phpphp
[3536] => BehatTestExtensionInstallStorage.phpphp
[4720] => ansible-args.mdmd
[326] => ZipOutputParser.phpphp
[9565] => BufferedOutput.phpphp
[712] => CliExtension.phpphp
[3436] => .travis.ymlyml
[4471] => HTMLPurifier.kses.phpphp
[2764] => RouteSubscriberCommand.phpphp
[10633] => RoutableFragmentRenderer.phpphp
[6906] => Reference.phpphp
[11663] => DoctrineCaster.phpphp
[8042] => GitHubChecker.phpphp
[1466] => ImageDriverInterface.phpphp
[2652] => DrupalCommand.phpphp
[7265] => classUsesNamespacedFunction.phpphp
[12129] => ExtensionInterface.phpphp
[12184] => ConditionalExpression.phpphp
[12128] => EscaperExtension.phpphp
[6678] => JsHintTask.phpphp
[5351] => main.ymlyml
[2104] => _bootstrap.phpphp
[143] => deploy_branch
[1360] => x8f.phpphp
[4713] => composer-dependency.mdmd
[7495] => ExceptionInAssertPostConditionsTest.phpphp
[4508] => info.txttxt
[8369] => 6.1.3-curl-adapter.phpphp
[3093] => create-data.ymlyml
[1882] => .gitkeepgitkeep
[3747] => example.makemake
[507] => EventDispatchingBackgroundTester.phpphp
[3336] => shell.ymlyml
[397] => AnnotationReader.phpphp
[4005] => xhUnitTest.phpphp
[5168] => test.ymlyml
[10909] => MissingMandatoryParametersException.phpphp
[8686] => FacetSetTest.phpphp
[2321] => FileCache.phpphp
[10538] => StreamedResponseTest.phpphp
[12572] => in.testtest
[7031] => StringContainsToken.phpphp
)

代码分解。

我将 RecursiveDirectoryIteratorGenerator 一起使用以节省内存使用量。

接下来,我没有洗牌一个巨大的数组,而是选择了另一种方法:在文件数组的最大计数和 0 范围内生成 81 个随机的、不重复的数字。一旦你有了随机数,只需使用 array_intersect_key,它相当快.

请注意一个我没有考虑到的逻辑陷阱:

  • 如果文件总数小于 81,for 循环将一直运行。


最后说明:我绝对相信比我聪明的人可以想出更好的办法,但现在这行得通。

此外,由于我使用的是 PHP 7.x,因此我拥有 opcache 的优势,并且对我而言性能会更好,您的结果可能会有所不同。

请注意,如果文件数量非常少,for 循环将运行更长时间,因为较小样本的碰撞变化更大。

关于PHP OpenDir 与 array_rand,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46649055/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com