gpt4 book ai didi

Doctrine 批处理迭代高内存使用率

转载 作者:行者123 更新时间:2023-12-03 12:05:01 25 4
gpt4 key购买 nike

我一直在研究在 Doctrine ( http://docs.doctrine-project.org/en/2.0.x/reference/batch-processing.html ) 中使用迭代器进行批处理。我有一个包含 20,000 张图像的数据库,我想对其进行迭代。

我知道使用迭代器应该可以防止 Doctrine 加载内存中的每一行。然而,两个示例之间的内存使用几乎完全相同。我正在计算使用 (memory_get_usage() / 1024) 前后的内存使用情况.

$query = $this->em->createQuery('SELECT i FROM Acme\Entities\Image i');
$iterable = $query->iterate();

while (($image = $iterable->next()) !== false) {
// Do something here!
}

迭代器的内存使用情况。
Memory usage before: 2823.36328125 KB
Memory usage after: 50965.3125 KB

第二个示例使用 findAll 将整个结果集加载到内存中方法。
$images = $this->em->getRepository('Acme\Entities\Image')->findAll();
findAll 的内存使用情况.
Memory usage before: 2822.828125 KB
Memory usage after: 51329.03125 KB

最佳答案

即使在 iterate() 的帮助下,使用 Doctrine 进行批处理也比看起来更棘手。和 IterableResult .

正如您所期望的IterableResult的最大好处是它不会将所有元素加载到内存中,第二个好处是它不保存对加载的实体的引用,因此 IterableResult不会阻止 GC 从您的实体中释放内存。

然而,还有另一个对象 Doctrine's EntityManager (更具体地说 UnitOfWork ),它保存对您显式或隐式查询的每个对象的所有引用( EAGER 关联)。

简而言之,每当您获得 findAll() 返回的任何实体时findOneBy()即使通过 DQL查询以及 IterableResult ,然后在 Doctrine 中保存对每个实体的引用。引用简单地存储在一个 assoc 数组中,这是伪代码:$identityMap['Acme\Entities\Image'][0] = $image0;
因此,因为在循环的每次迭代中,您之前的图像(尽管不在循环范围或 IterableResult 的范围内)仍然存在于此 identityMap 中。 , GC 无法清理它们并且您的内存消耗与您调用 findAll() 时相同.

现在让我们通过代码看看实际发生了什么

$query = $this->em->createQuery('SELECT i FROM Acme\Entities\Image i'); 

//这里 Doctrine 只创建 Query 对象,这里没有 db 访问
$iterable = $query->iterate(); 

//与 findAll() 不同,在此调用时不会发生数据库访问。
//这里 Query 对象简单地包装在一个迭代器中
while (($image_row = $iterable->next()) !== false) {  
// now upon the first call to next() the DB WILL BE ACCESSED FOR THE FIRST TIME
// the first resulting row will be returned
// row will be hydrated into Image object
// ----> REFERENCE OF OBJECT WILL BE SAVED INSIDE $identityMap <----
// the row will be returned to you via next()

// to access actual Image object, you need to take [0]th element of the array


$image = $image_row[0];
// Do something here!
write_image_data_to_file($image,'myimage.data.bin');

//now as the loop ends, the variables $image (and $image_row) will go out of scope
// and from what we see should be ready for GC
// however because reference to this specific image object is still held
// by the EntityManager (inside of $identityMap), GC will NOT clean it
}
// and by the end of your loop you will consume as much memory
// as you would have by using `findAll()`.

所以第一个解决方案实际上是告诉 Doctrine EntityManager 将对象从 $identityMap 中分离出来。 .我也换了 while循环到 foreach使其更具可读性。
foreach($iterable as $image_row){
$image = $image_row[0];

// do something with the image
write_image_data_to_file($image);

$entity_manager->detach($image);
// this line will tell doctrine to remove the _reference_to_the_object_
// from identity map. And thus object will be ready for GC
}

然而,上面的例子几乎没有缺陷,即使它在 doctrine's documentation on batch processing 中有特色。 .它运作良好,以防您的实体 Image不执行 EAGER加载它的任何关联。但是,如果您急切地加载任何关联,例如。 :
/*
@ORM\Entity
*/
class Image {

/*
@ORM\Column(type="integer")
@ORM\Id
*/
private $id;

/*
@ORM\Column(type="string")
*/
private $imageName;

/*
@ORM\ManyToOne(targetEntity="Acme\Entity\User", fetch="EAGER")
This association will be automatically (EAGERly) loaded by doctrine
every time you query from db Image entity. Whether by findXXX(),DQL or iterate()
*/
private $owner;

// getters/setters left out for clarity
}

因此,如果我们使用与上面相同的代码段,则
foreach($iterable as $image_row){
$image = $image_row[0];
// here becuase of EAGER loading, we already have in memory owner entity
// which can be accessed via $image->getOwner()

// do something with the image
write_image_data_to_file($image);

$entity_manager->detach($image);
// here we detach Image entity, but `$owner` `User` entity is still
// referenced in the doctrine's `$identityMap`. Thus we are leaking memory still.

}

可能的解决方案是使用 EntityManager::clear()代替或 EntityManager::detach()这将完全清除身份映射。
foreach($iterable as $image_row){
$image = $image_row[0];
// here becuase of EAGER loading, we already have in memory owner entity
// which can be accessed via $image->getOwner()

// do something with the image
write_image_data_to_file($image);

$entity_manager->clear();
// now ``$identityMap` will be cleared of ALL entities it has
// the `Image` the `User` loaded in this loop iteration and as as
// SIDE EFFECT all OTHER Entities which may have been loaded by you
// earlier. Thus you when you start this loop you must NOT rely
// on any entities you have `persist()`ed or `remove()`ed
// all changes since the last `flush()` will be lost.

}

所以希望这有助于理解 Doctrine 迭代。

关于Doctrine 批处理迭代高内存使用率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23545768/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com