Building an iterator that can grow
A few weeks ago I ran into a tricky to solve issue in CakePHP. It involved an iterator that needs be grown during iteration, and nested loops over that same iterator. While infrequent, there are scenarios where you would want to grow an iterator as it is being iterated. My situation is the plugin registry for CakePHP. Plugins support a bootstrap
hook method that is used to initialize a plugin. During that hook a plugin may load additional plugins if it depends on other plugins. These child plugins also need their bootstrap
hook fired. Instead of trying to solve this problem with multiple loops, we use an iterator that grows in place. The problematic loop code looks like:
- // $plugins is the iterator containing all the plugins in an application.
- foreach ($plugins as $plugin) {
- $plugin->bootstrap();
- }
While this code is simple, the problem arose from a plugin that iterated plugins looking for sub-classes of its interfaces. This effectively created something like:
- $plugins = new PluginIterator([new Plugin('one'), new Plugin('discover'), new Plugin('three')]);
- foreach ($plugins as $plugin) {
- // Assume this condition is hit mid-way through
- // the loop
- if ($plugin->getName() == 'discover') {
- foreach ($plugins as $inner) {
- $plugin->discoverTasks($inner);
- }
- }
- echo "Found {$thing->getName()}\n";
- }
If discover
was in the third item out of five, the echo
in the outer loop would never be reached for the fourth and fifth items. In order to understand why we should review the various ways you can make custom iterators in PHP and the properties each approach imparts.
Creating a PHP Iterator
The simplest way to make an object iterable in PHP is to implement IteratorAggregate interface. This interface allows you to delegate iteration or use a generator:
- class PluginIterator implements IteratorAggregate
- {
- public function __construct($items)
- {
- $this->items = $items;
- }
- public function getIterator()
- {
- return new ArrayIterator($this->items);
- }
- public function add($item)
- {
- $this->items[] = $item;
- }
- }
Using this approach will allow your iterator to be used in nested loops safely. However, each loop will take a state snapshot when iteration is started and items appended to the iterator using add()
will require additional loops as we can’t grow the copy being iterated. This snapshot behavior is also shared by arrays in PHP, which is one of the reasons we need to use an iterator in the first place.
Another approach is to implement the Iterator interface directly. This gives us more control over how our object is iterated. A simple Iterator
implementation could look like:
- class PluginIterator implements Iterator {
- private $position = 0;
- private $items;
- $this->items = $items;
- $this->position = 0;
- }
- $this->position = 0;
- }
- return $this->items[$this->position];
- }
- return $this->position;
- }
- $this->position++;
- }
- public function valid() {
- }
- public function add($item)
- {
- $this->items[] = $item;
- }
- }
This implementation allows us to grow the iterator as it is being iterated. However, it will not work in a nested loop scenario. When PHP starts iteration it calls rewind()
instructing the iterator to go back to the beginning. If we revisit the example from before:
- $plugins = new PluginIterator([new Plugin('one'), new Plugin('discover'), new Plugin('three')]);
- // rewind() is called here
- foreach ($plugins as $plugin) {
- $plugin->bootstrap();
- if ($plugin->getName() == 'discover') {
- // rewind() is called here too!
- foreach ($plugins as $inner) {
- $plugin->discoverTasks($inner);
- }
- // The iterator's position is now at the end of its data,
- // and the outer loop's valid() call will return false
- // stopping iteration.
- }
- echo "Found {$thing->getName()}\n";
- }
Our code will never call bootstrap()
on item three, and never output Found three
which isn’t what we want.
My solution
The problem boils down to only having a single integer to track the iterator’s current position. Given that we want to support nested loops, we either need to freeze state, or keep track of multiple positions. While freezing state would be simpler, it would also break our ability to grow the iterator in place. That leaves tracking multiple positions. I ended up creating a stack of position indices. The basic implementation looks like:
- class PluginIterator implements Iterator {
- private $positions = [];
- private $loopDepth = -1;
- private $items;
- $this->items = $items;
- }
- $this->positions[] = 0;
- $this->loopDepth += 1;
- }
- $position = $this->positions[$this->loopDepth];
- return $this->items[$position];
- }
- return $this->positions[$this->loopDepth];
- }
- $this->positions[$this->loopDepth]++;
- }
- public function valid() {
- $position = $this->positions[$this->loopDepth];
- if (!$valid) {
- $this->loopDepth -= 1;
- }
- return $valid;
- }
- public function add($item)
- {
- $this->items[] = $item;
- }
- }
While this solves the nested loop problem, it isn’t perfect. It fails if the inner loop uses break
. It will also accumulate position values each time iteration is stopped prematurely with a return
. Aside from those two caveats it does enable us to perform nested loops on an iterator that can be modified in-place. Hopefully this can help you in the future should you need to build an iterator with similar constraints.
While I agree that iterations levels can be useful, I think in that case you have kinda rewrite this part:
<pre>
foreach ($plugins as $plugin) { $plugin->bootstrap(); if ($plugin->getName() == ‘discover’) { foreach ($plugins as $inner) { $plugin->discoverTasks($inner); } }
}
</pre>
To this:
<pre>
$discover = null;
foreach ($plugins as $plugin) { $plugin->bootstrap(); if ($plugin->getName() == ‘discover’) { $discover = $plugin; }
}
foreach ($plugins as $plugin) { $discover->discoverTasks($plugin);
}
</pre>
And in this case you will not discover tasks for un-inited plugins.
May be it is not an issue in this case, but it simplier.
ooooo on 5/26/19
ooooo: If the first loop adds plugins, which can also add their own dependencies. Won’t you be missing calling the @bootstrap@ method on the 2nd level of plugins? Wouldn’t you have to add yet another loop after the first bootstrap call to get the newly added items? One of my goals was to generalize the solution and avoid having to traverse the set twice, and support arbitrary depth.
mark story on 7/28/19