Blog: How Darwin determines which letterboxes your folders will land in
How can algorithms be used for optimisation of folder distributions?
Published on December 29, 2016
Lots of businesses distribute folders to draw clients to their stores. During this process, they often encounter the same problem: how does one determine which doormats the folders should land on, to get the maximum profitability? In most cases, postal code areas are selected by hand, based on the socio-demographic characteristics or sales figures for clients per area. This is not so easy when done by hand, because there are roughly 440,000 postal code areas in the Netherlands. Even so, the way in which postal code areas are selected, can be much smarter, by using Evolutionary Algorithms. This is a Machine Learning technique which is ideal for optimisation problems, and often finds solutions that are far superior to those found via the natural process. Therefore, the same number of folders can be used to achieve much better sales figures. Moreover, this technique is quite simple to implement.
Selection of the postal code areas
I will explain how Evolutionary Algorithms can best be used for optimisation of folder distributions, based on a specific example. Suppose: a retail chain (with dozens of stores) is distributed across the Netherlands. This retail chain also has a client card programme, providing insight into where the client comes from and what the client’s buying behaviour is, for those clients with a card. Each store in the chain is allocated several folders – based on its sales – to distribute in its catchment area. A catchment area includes all postal code areas in which clients live who purchased from the specific store. The aim is to choose the postal code areas in the catchment area, per store, in such a way that the maximum number of client cardholders are reached with the number of folders allocated. The number of client cardholders per postal code area is known at the retail chain, as well as the total number of households. In this example, we use the client cardholders as a criterion, but various other selection criteria are possible, like: approaching as many areas with owner-occupier homes as possible, areas where lots of children live, or areas responsible for the highest sales numbers.
This is an optimisation problem, and Evolutionary Algorithms are ideal in this case. This technique is based on the concepts of natural evolution and applies methods such as reproduction, mutation and selection. How can you make the step from evolution to the folder distribution problem? By generating a random ‘population’ of solutions (‘individuals’). The algorithm is used each time to solve the distribution problem per store. If the retail chain has 50 stores, the algorithm must be implemented 50 times. This can obviously be automated, rather than having to manually calculate the distribution for folders across the postal code areas for 50 stores with 50 different folder publications and 50 different catchment areas. For the first population, the decision to distribute folders is determined randomly, per store, for each postal code in the catchment area. This way, not just one, but a whole bunch of random solutions are generated.
This could – greatly simplified – appear as follows. Store A has a catchment area that includes four postal codes. A random solution could be: postal code 1 and 3 receive folders, whilst postal code 2 and 4 do not. You can represent this as 1010. This is the first individual’s ‘genotype’. An entire population can be constructed like this:
Which of these solutions is the right one? To determine this, a fitness function is required (Evolution… Darwin… Survival of the Fittest!). This could be: all client cardholders reached (those living in the selected postal code areas) divided by the total number of client cardholders in the entire catchment area for store A. If the maximum number of folders is exceeded, the fitness level will be zero for an individual.
Fitness = Cardholders selected / Store’s cardholders
Since the population has been initialised and the fitness level has been determined for all individuals, the next generation must be assembled. The individuals from the first population must have ‘children’ that are defined in the next generation. We are now in the selection phase. This can be achieved in different ways. A customary method is by choosing four individuals, in which case the two with the highest fitness levels mix their genes to determine their children’s genotypes. This is called cross-over of the genetic codes. Below is an example of 2-point cross-over. The DNA of the parent strings is cut at 2 points and the children’s DNA is determined by using the DNA fragments of both parents:
The above picture shows, according to the solution for the top child, that all four postal code areas are supplied with folders. The solution for the child at the bottom only provides for folders in PC2 and PC4. Both solutions are combinations of the solutions of the parents, and will, in general, be better solutions (because a group of four parents is randomly selected each time, and the best two will have children).
This process is repeated, until the new generation is the same size as the old one. In addition, mutations are also applied to individuals in the new population, with a low probability. This implies that a postal code area that would have initially received folders, could be excluded, or vice versa. For a few individuals, a gene can therefore be altered:
This is needed so that the algorithm is not stranded in a suboptimal solution, but also to make solutions possible that are beyond the possibilities of the initial gene pool.
The selection and mutation cycle is repeated until many generations have passed (for example: 1000 or 10000), until the fitness level reaches a specific value or until the fitness level has hardly increased for a few generations. Eventually, the individual with the highest fitness level is selected from the last generation. The solution for that individual determines how the folders are distributed for the relevant store. This is how Darwin determines which letterbox the folder lands in.
Higher conversion rates
The above mentioned describes a simple example consisting of four postal code areas. However, when we apply this technique for stores with a complex catchment area – which includes hundreds of postal code areas – a graph may help to clearly display the increase in the best individual’s fitness level, generation after generation. The algorithm easily scores better than the folder distribution prepared by the responsible employee of the retail chain (red line):
Since Evolutionary Algorithms arrive at better solutions than those that can be devised by employees with business knowledge, the same number of folders can eventually achieve a higher conversion rate in the store. Or, the same sales figures can be achieved, but with fewer folders printed and distributed.
Evolutionary Algorithms are quite easy to implement. For the example above, we have programmed the algorithm with JGAP. The bottleneck for Evolutionary Algorithms was always the computing power of a computer. In most cases, the computer needed a few hours (depending on the complexity) to calculate everything. However, with all the cloud computing possibilities nowadays, that is no longer a problem.
Nowadays, these algorithms are rarely used for optimisation problems in the field of marketing. The example described above, shows that these algorithms could also be of added value when used this way. They find solutions for complex problems that one would not think of so quickly.