Reaching your most valuable audiences: Predicting Conversion Intent Part 3
by Lotte Jonkman, on Jan 18, 2021 10:00:00 AM
Blog 3 out of 4: Training and evaluating the conversion model
- Blog 1: capture the behavioural ‘rules’ that lead to your target
- Blog 2: feature selection analysis
- Blog 3: training/evaluating the model
- Blog 4: deploying the model
In this third blog of the Conversion Model series, I explain the step of training and evaluating the model. Now that we have created and selected the features and target, we are set to build a model that can predict the Conversion Intent of a visitor. What we want to achieve is to create a model that uses the selected features as input variables to label if a visitor is going to convert in the next 15 days.
Training and test data
As said before we can use historical data to create a model based on a set of features that predicts a given target. Once we have created this model we can apply it to new and unseen visitors. What is important in this step is that we want to create a model that is able to explain the target the best as possible, and also generalises well on new data. What we don’t want is the model to overfit, meaning that it is only accurate for the data used when creating the model.
There are many steps to be able to account for this, but in order to know if this is happening, the best practice is to split the dataset that is used for training the model into a training dataset and a test dataset. The training dataset is the data you use to train your model. Once the model is trained, the test dataset is used to evaluate the trained model. We often use 80% of the data as a training set and 20% as a test set, but this can be any number.
Training the model
Now that we have a training dataset, we can start training our model. The Google Cloud Platform has many options for this, for example using the AI Platform or BigQuery ML. Because our data lives in BigQuery and the models we want to use are available in BigQuery ML, we decided to use BigQuery for the training step as well.
The standard SQL in BigQuery uses a CREATE MODEL statement to create and train the model. You can provide parameters like the model type and the number of iterations. These choices can affect the effectiveness of your model, so it is best to try out different settings. To evaluate what is the best choice you can use your test dataset.
Evaluating the model
To know how well the trained model performs in predicting the target, we can use the test dataset and several evaluation metrics. BigQuery already has a function for this called ML.EVALUATE, where we use the trained model and the test dataset as input. This evaluation function applies the model to the features of the test dataset and creates predictions. Then these predictions are compared against the actual labels, in this case, if someone converted or not in the 15 days after a visit. Then the function calculates several evaluation metrics, for example, accuracy (the percentage of correct labels), precision (the percentage of true positive predictions in all positive predictions, and recall (the percentage of true positive predictions in all actual positive labels).
It depends per scenario what evaluation metric to look at. For example, when using Conversion Intent within Google Audiences we think it is more important to predict all possible converters than to predict all possible non-converters. We don’t mind predicting that a non-converter will convert if that means we do have all possible converters in our Audience. Therefore we look at precision more than recall and accuracy in this example.
As mentioned, you can try out many different features and parameters when training your model. To know what combination to use you can evaluate the newly trained model each time and compare the evaluation metrics against each other. When we think we cannot improve anymore, we can deploy the model.
This is something that will be discussed in the next blog. Besides, the world is ever-changing, therefore we suggest retraining the model and use fresh data. This is something that can be automated, which we do for our customers using a Model Training Pipeline. This is also something that I will discuss in the following blog.
Crystalloids helps companies improve their customer experiences and build marketing technology. Founded in 2006 in the Netherlands, Crystalloids builds crystal-clear solutions that turn customer data into information and knowledge into wisdom. As a leading Google Cloud Partner, Crystalloids combines experience in software development, data science, and marketing, making them one of a kind IT company. Using the Agile approach Crystalloids ensures that use cases show immediate value to their clients and frees their time to focus on decision making and less on programming.