Quality Assurance - How data quality directly affects your business
by Luisa Rey Gomez, on Jul 24, 2020 1:15:42 PM
Part 1 of the quality assurance blog series
When your business decisions and customer interactions are based on data coming from multiple sources, repeatedly and in different formats, there is a risk of failures in the processes and in the data. In this blog, we will explain what quality assurance is when it is required and share best practices on how to apply it.
The consequences of poor data quality
Nowadays, raw data is used increasingly to create new information and knowledge. Integrating this data forms the basis for algorithms and machine learning processes. Wrongly matched or enriched data are no more reliable, and when used in business processes, they impact the ability to make informed judgements and decisions about your business.
That is why it is necessary to monitor and check whether the data is being processed and delivered in the right order. The quality of the data needs to be in an implacable state to be used in critical decision processes.
Duplicate data, for example, can negatively impact your business KPIs. Having two or more records for one customer may lead to sending the same person the same campaign multiple times. It affects your email deliverability, personalisation efforts, response rates, campaign results and the overall ROI of your marketing activity.
Wrong business decisions made from incomplete data are incredibly costly. According to Gartner, poor data quality leads to a $15 million average financial impact on organisations per year. Next to financial implications, running analysis with incomplete or incorrect data is also very time-consuming for your business analysts trying to find and fix the errors.
What is quality assurance?
Quality assurance according to ISO 9000, is “part of quality management focused on providing confidence that quality requirements will be fulfilled". It might seem a difficult task to handle because there are many processes that transform and store the data that have to be checked.
Doing all of the monitoring manually is very time-consuming and not 100% reliable. To assure the quality of data, this kind of checks can be configured to be executed automatically, and alerts can be set up in case something goes wrong. Designing and implementing a solution that allows continuous and automated monitoring might be a good idea. That way, your business is immediately informed when something goes wrong.
Consistent, valid and complete data
Your customer data and processes are the foundation of your infrastructure - it’s time to treat them like one. As part of quality assurance, we diagnose issues that affect data stored in a data warehouse, Google BigQuery, for example. We monitor all data processes including dataflows, triggers, stored procedures or any other processes one might think of. The goal is to make sure that all operations run successfully and data is consistent, valid and complete.
Example user story:
When we started a project with one of our clients, we needed to check if all processes were completed successfully based on the data stored in Google Cloud BigQuery. Every evening the consolidated processes were executed over a significant amount of tables, affecting a massive quantity of data, some or part of them could fail. To check whether the tables were updated daily, we created an automated daily check. This is how the query looked like:
The first versions helped the business to identify if something failed quickly. However, after some days, we found out that it didn't apply to all processes that produce a direct output on BigQuery tables.
Thinking out of the box, we designed additional mechanisms to identify possible errors. Using Google Cloud SDK, we created a new kind of check that does not only validate data stored in BigQuery, but also verifies if a file exists in Google Cloud Storage. This way, it was easy to monitor the name of the bucket, the file name format and to set up an email alert when the file was not in the specified location.
During the execution of the project, we faced different challenges. Probably the most interesting one was trying to define what was the best way to do the checks. On the one hand, adding as many options as we could to the configuration table would be very flexible. On the other hand, it would overload the person who was configuring. We needed to make it responsive and easy to use, as well. Finally, together with the business, we defined the most important values to include as configurable, and leave the less important ones by default.
Adopt quality assurance
In this blog, we’ve looked at the importance of a good quality of data and how your business depends on it. Once you adopt quality assurance and start proactively monitoring your data and processes to minimise the chance of an error you will be able to make critical business decisions without consequences.
You might also like to read:
- 5 reasons to adopt DevOps and accelerate software deployment
- The process and best practices of maintenance in Google Cloud
Crystalloids helps companies improve their customer experiences and build marketing technology. Founded in 2006 in the Netherlands, Crystalloids builds crystal-clear solutions that turn customer data into information and knowledge into wisdom. As a leading Google Cloud Partner, Crystalloids combines experience in software development, data science, and marketing, making them one of a kind IT company. Using the Agile approach Crystalloids ensures that use cases show immediate value to their clients and frees their time to focus on decision making and less on programming.