Quality assurance - How does it work?
by Luisa Rey Gomez, on Aug 3, 2020 9:30:00 AM
Part 2 of the quality assurance blog series
In our first article about quality assurance, we talked about the importance of consistent, valid and complete data. As data is the core requirement for enabling digital business, relying on the quality of data to evaluate the state of your business and make informed decisions is critical. The second part of the quality assurance series focuses on describing how the monitoring dataflow works for data stored in Google BigQuery.
Identifying the errors
As a part of quality assurance monitoring in Google Cloud, you can automatically get alerted about anomalies or problems. That way, you can immediately find out if one of your critical data processes goes down and quickly take action.
The process that helps us assure the quality of data is the monitoring dataflow. It tells us whether some information is missing, incomplete or whether some processes could not be executed as expected. As part of the monitoring process, we can check whether a table contains a number from a given range (other than null) or exceeds a certain value. Also, whether the email address is valid, unique or whether the table is updated every day.
Data is the element you built your business credibility on. Neglecting the quality of your data and processes can have a significant impact on the efficiency and performance of your business.
Monitoring, step by step
The monitoring dataflow performs pre-configured checks on BigQuery tables in repeated intervals. Check results are stored and reported with the frequency configured (checks not passed, and checks passed during the last execution). There are three steps in the dataflow:
- Read from BigQuery (configuration table)
- Validate/execute checks
- Write results to BigQuery.
The first step is to read the configuration table to get all the active checks. The status of the checks can be easily set to “active” or “inactive” based on what needs to happen. For every check we must validate the frequency, it means not all checks need to be executed all times. The frequency setting can be controlled in the field for each one of them.
For the checks that apply, in step two, the rule is validated and a table in BigQuery stores the results. This last action is the final step of monitoring dataflow; after that, a new scheduled procedure is executed. All the results are compiled in a report and emailed to selected recipients from the business. It can look like this:
We have created the colour coding to identify the status of the check results quickly. But the report can be easily configured to include more than that. Each customer can suggest an action to be put in the report based on their preference and needs, such as what action to take when an error is detected.
- Blue - all the checks that could not be executed, also marked as “check_not_executed”. This could be because the request table or field doesn’t exist or the format of the values doesn't match with the condition, among others;
- Red - the checks that didn't pass, it means, we can run the validation, but it doesn’t match with the desired value;
- Yellow - basically this is the same as red, but in this case, the check is marked as a warning;
- Green - all the checks that could be executed and their result match with the desired result.
The result value shows relevant information about the result of the check and the query that was executed to validate it. Once we have this information, the query can be copied into a console to check what went wrong easily. The corrective actions can be taken internally by the business who receives the email or resolved by the development team.
Anomalies or downtimes can not only negatively affect your business bottom line but can also hurt your reputation. Crystalloids provides quality assurance to help you find errors early before they affect your business.
Crystalloids helps companies improve their customer experiences and build marketing technology. Founded in 2006 in the Netherlands, Crystalloids builds crystal-clear solutions that turn customer data into information and knowledge into wisdom. As a leading Google Cloud Partner, Crystalloids combines experience in software development, data science, and marketing, making them one of a kind IT company. Using the Agile approach Crystalloids ensures that use cases show immediate value to their clients and frees their time to focus on decision making and less on programming.