A Quick Performance Optimization Guide Using PHP Generators

Reduce execution time and memory usage with generators

Lucas Pereyra
Better Programming

--

From the official PHP website:

Generators provide an easy way to implement simple iterators without the overhead or complexity of implementing a class that implements the Iterator interface.

A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which may cause you to exceed a memory limit, or require a considerable amount of processing time to generate. Instead, you can write a generator function, which is the same as a normal function, except that instead of returning once, a generator can yield as many times as it needs to in order to provide the values to be iterated over.

Reducing memory usage with generators

Let’s take the following example:

This simple script uses a foreach loop to calculate the total sum of all numbers from 0 to THRESHOLD. Moreover, I’ve included a memory usage printing function that is called once during the traversal execution and at the end, after finishing looping.

For this first version, these are the values that I get when I try different THRESHOLD’s:

When the integers array is generated, it is temporarily stored until the foreach loop execution is finished. This explains why memory usage tends to increment during the array traversing as we increment the THRESHOLD value (we increment the array size). Yet, this doesn’t seem to have an impact on memory usage at the end of the script.

Let’s make a change in the myIntegers function so as to make use of PHP generators:

By taking advantage of PHP generators, this is what happens:

When using generators, PHP only keeps track of the current state of the traversal (the value we’re returning with yield), without needing to store the whole collection that’s being traversed. Hence, memory usage during the loop execution is much lower than with the first approach.

Reducing execution time with generators

For this second example, let’s take a look at the following snippet:

This simple script traverses an array of items, each of which, takes 2 seconds to be fetched (hence, simulating a scenario where we’d fetch items from a remote resource, e.g. an external file or API). I’ve included 2 different timers to measure the time we should wait until having the first item available inside the loop, and the total execution time.

With this first approach, since PHP will collect all the items before entering the foreach loop, we’ll have to wait 6 secs. until having the 1st item ready to be used inside the loop. This is the same time that it takes to fetch all the items: we couldn’t start working with the 1st item until we already have all of them fetched. Of course, total execution time is also 6 secs.

By using a simple generator to provide each item to the foreach loop, we could have the 1st item ready for us in just 2 secs:

What is interesting of this example is that it allows you to start working with the Nth item of the collection you’re traversing without having to worry about the (N+1)th item. Of course, there could be some use cases that entail having to work with either (N+1)th or (N-1)th items along with the Nth, and for those scenarios, a further analysis may be required.

>php test.php
Took 6 seconds to get the 1st item
Execution finished in 6 seconds
>
>php test_2nd_approach.php
Took 2 seconds to get the 1st item
Execution finished in 6 seconds

Now, let’s suppose we can stop iterating under a certain condition that depends upon the item, just like this:

By executing this code with both versions (fetching all the items before start looping vs. fetching one item at a time on demand) we get:

>php test.php
Took 6 seconds to get the 1st item
Execution finished in 6 seconds
>
>php test_2nd_approach.php
Took 2 seconds to get the 1st item
Execution finished in 4 seconds

Again, the first approach involves having to wait for each item to be fetched before start looping, whereas the second one introduces an “on demand” behavioral pattern. Hence, remembering that the “cat” item was 2nd in the items collection, once the condition is met and the foreach loop breaks, there’s no need to fetch the 3rd item, allowing the script to save 2 seconds on its final execution time.

Conclusions

PHP generators are a powerful feature that has proved itself to be really useful when it comes to performance and optimization improvements. Not only do they allow us to decrease memory usage, but could also help us to deal with slowly algorithms and execution time-related issues.

Though not covered in this post, you should be able to get the same benefits you get from generators, when using iterators and making your own iterator extensions. Iterators are a more Object-Oriented alternative and often imply creating a more complex solution by having to implement all the Iterator interface’s methods.

There are some special scenarios where applying generators could be tricky and could cause a more complex implementation. Examples of these include having to access (N+1)th or (N-1)th items along with the Nth item when traversing the collection; or having to deal with nested loops that iterate over the same collection. I suggest starting with the simplest approach that works when implementing a solution.

Then, once you’ve made some measuring and you’re certainly sure that there are performance issues, try to apply generators and/or other alternatives that may work, in the form of refactors. Don’t try to start using generators as the 1st approach, since things may become complex and the code itself could start hiding its real intentions.

--

--

Systems Engineer, full-time Backend Developer, part-time learner. 4 years’ experience working on the web development industry.