Google BigQuery: Best Practices for Big Data
by Ibrahim Atli on Jul 13, 2023 3:59:38 PM
Businesses rely on enterprise data warehouse platforms like Google BigQuery to store, process, and analyze big data. But– if not used correctly– these data analytics tools can become expensive.
Using a client case, we will share how you can dramatically decrease your BigQuery and analytics costs via strategic data retrieval improvements.
Costly Data Chaos:
Our e-commerce client was experiencing a significant spike in their data analytics costs on BigQuery. Unsure as to why there had been an increase in cost, they turned to our team.
With our expertise in PaaS on Google Cloud, our team at Crystalloids examined how our client was utilizing BigQuery and where their data retrieval pain points were. We found that the e-commerce brand was experiencing increasing costs due to the growing size of their data tables and inefficient update requests on BigQuery. They were spending more on:
- Processes they had implemented to run on an hourly basis
- How their data tables were being updating and inserted
- Their data retrieval and enriching process once the data was retrieved from BigQuery
- Transmitting the enriched data back to another location
These Google Analytics (GA) tables, which handle data such as customer transactions, were set to run every hour. The issue was in how the data was being updated once the process was triggered at the hour.
Implementing Cost-Saving Best Practices
Not only did we find a solution for the e-commerce brand, we implemented 3 best use practices to enhance their BigQuery use while reducing total spend.
Below we outline the three implementations that improved our clients use of the serverless data warehouse while reducing their overall costs:
- Updates and Batch Processing
One of the primary contributors to high BigQuery costs comes when running frequent and individual row updates. We always advise our clients to avoid running single update queries because BigQuery is designed to analyze and query large datasets rather than performing frequent updates.
In fact, the price model for BigQuery is primarily based on the amount of data being processed/scanned. Performing frequent update queries will lead to increased data processing and, subsequently, higher costs.
As related to our client, we found the frequent update mechanism that had been implemented was ineffective and recommended they consolidate update queries through the use of a caching mechanism. This caching mechanism would then allow for a batch update approach, to reduce the number of individual updates.
In the long run, this tweak in their data retrieval process significantly minimized associated costs and unnecessary operational expenses while improving overall efficiency.
- Selective Field Usage for Query Optimization
Sometimes, a review of the fields being queried can uncover ways to optimize your BigQuery data while reducing costs. When we looked at the SQL select queries utilized in the implemented scripts for our client we found that they were retrieving all fields (using *) from the tables. This resulted in the excessive scanning of unnecessary data.
Our solution was to walk through and analyze the specific fields required for the processes with our client. We then modified the queries accordingly. Once the selected fields had been narrowed down to only those essential for the processes, we drastically reduced the scanning bytes required by BigQuery. This optimization technique led to a significant decrease in the amount of data being scanned and sorted, resulting in substantial cost savings.
- Harnessing the Power of Partitioned Tables
Finally, we found that when our clients converted existing BigQuery tables into partitioned tables, the total spend on BigQuery decreased.
When you use partitioned tables, the data is grouped based on logical divisions such as time intervals or specific criteria. This drastically reduces the amount of data being scanned for each query.
Once we implemented partitioned tables for our client, they saw a significant reduction in scanning bytes, which translated to substantial cost savings.
Spend Less with Google BigQuery
After implementing these three optimization techniques on BigQuery, our customer saw a remarkable improvement in data and analytics operating costs. Within 20 days, our client experienced a cost reduction of 38.91% in BigQuery expenses (see Figure 1 below).
Figure 1: Total Cost Reduction
Implementing our optimizations in the first month saved 38.91% in around 20 days
After one month, a side-by-side comparison was conducted with the passing month. The results were remarkable: our client saved 77.23% (see Figure 2). By identifying and rectifying inefficiencies, our team implemented substantial cost saving tactics while enhancing the performance of current processes.
Figure 2: Side-by-Side Cost Comparison
Comparison of optimization reflections over the last month show our client saved 77.23% after our implementations
Efficient cost management is essential when working with enterprise data warehouses like BigQuery that are designed for analyzing big data sets. Remember:
By following a systematic approach and implementing strategic optimization techniques, you can significantly reduce costs and improve performance for your business. Through the consolidation of update queries, selective field usage, and the utilization of partitioned tables, our team achieved significant cost savings and enhanced overall efficiency for our e-commerce client.
This customer success story is just one example of how we help businesses strengthen their data processing and retrieval while reducing overall spending. If you want to explore data optimization opportunities or assess your current costs on BigQuery or other Google Cloud services, feel free to reach out to us.
Crystalloids offers workshops to discover use cases such as these together! Interested?
Crystalloids helps companies improve their customer experiences and build marketing technology. Founded in 2006 in the Netherlands, Crystalloids builds crystal-clear solutions that turn customer data into information and knowledge into wisdom. As a leading Google Cloud Partner, Crystalloids combines experience in software development, data science, and marketing, making them one of a kind IT company. Using the Agile approach, Crystalloids ensures that use cases show immediate value to their clients and make their job focus more on decision making and less on programming.
No Comments Yet
Let us know what you think