Share this
How to build a Data Warehouse and Data Lake in one platform
by Jan Hendrik Fleury on Aug 24, 2021 10:51:04 AM
For many years, the architectures of a Data Warehouse and a Data Lake have been viewed as separate systems, applicable to specific data types and user skill sets. That’s history. Recent innovations provide us with an opportunity to create a comprehensive platform that gives us the best of both worlds.
End-to-end data management and processing is what we want
We have been creating end-to-end solutions covering the entire data management and processing stages, from data collection to data analysis and machine learning. The result is a data platform that can store vast amounts of data in varying formats and do so without compromising on latency. At the same time, this platform can satisfy the needs of all users throughout the data lifecycle.
One of the aspects I love about our work is that there is no one-size-fits-all approach to building an end-to-end data solution. Emerging concepts include data lakehouses, data meshes, and data vaults that seek to meet specific technical and organizational needs. All of them work naturally within a Google Cloud environment. It really does. We have several clients who are enjoying the benefits of the converging technologies.
Data Mesh, Data Lake, Data Vault
Data mesh facilitates a decentralized approach to data ownership, allowing individual lines of business to publish and subscribe to data in a standardized manner, instead of forcing data access and stewardship through a single, centralized team.
On the other hand, a Data Lakehouse brings raw and processed data closer together, enabling a more streamlined and centralized repository of data needed throughout the organization. Processing can be done in transit via ELT in BigQuery, reducing the need to copy datasets across systems. This is making data exploration and governance easier.
The Data lakehouse works to store the data in a single-source-of-truth, making minimal copies of the data. This architecture offers low-cost storage in an open format accessible by a variety of processing engines like Spark, while also providing powerful management and optimization features. Consistent security and governance are key to any lakehouse.
Finally, a data vault is designed to separate data-driven and model-driven activities. Data integrated into the raw vault enables parallel loading to facilitate the scaling of large implementations.
In Google Cloud, there is no need to keep them separate. In fact, with interoperability among our portfolio of data analytics products, you can easily provide access to data residing in different places, effectively bringing your data lake and data warehouse together on a single platform.
Under the hood
Let's look under the hood at some of the technological innovations that make this reality. BigQuery’s storage API allows treating a data warehouse as a data lake, letting you access the data residing in BigQuery.
For example, you can use Spark to access data residing in the data warehouse without it affecting the performance of any other jobs accessing it. This is all made possible by the underlying architecture, which separates compute and storage. Likewise, Dataplex, Google’s intelligent data fabric service, provides data governance and security capabilities across various lakehouse storage tiers built on GCS and BigQuery.
Point solutions versus a truly unified analytics platform
What sets Google Cloud’s data analytics platform apart is by being open, intelligent, flexible, and tightly integrated. There are many technologies in the market which provide tactical solutions that may feel comfortable and familiar.
However, this can be a rather short-term approach that simply lifts and shifts a siloed solution into the cloud. In contrast, an analytics data platform built on Google Cloud offers modern data warehousing and data lake capabilities with close integration to their AI Platform. It also provides built-in streaming, ML, and geospatial capabilities and an in-memory solution for BI use cases.
Let’s talk about shaping your analytics capabilities over a coffee!
Share this
- April 2024 (3)
- March 2024 (2)
- February 2024 (2)
- January 2024 (4)
- December 2023 (1)
- November 2023 (4)
- October 2023 (4)
- September 2023 (4)
- June 2023 (2)
- May 2023 (2)
- April 2023 (1)
- March 2023 (2)
- January 2023 (4)
- December 2022 (3)
- November 2022 (5)
- October 2022 (3)
- July 2022 (1)
- May 2022 (2)
- April 2022 (2)
- March 2022 (5)
- February 2022 (3)
- January 2022 (5)
- December 2021 (5)
- November 2021 (4)
- October 2021 (2)
- September 2021 (2)
- August 2021 (3)
- July 2021 (4)
- May 2021 (2)
- April 2021 (2)
- March 2021 (1)
- February 2021 (2)
- January 2021 (1)
- December 2020 (1)
- October 2020 (2)
- September 2020 (1)
- August 2020 (2)
- July 2020 (2)
- June 2020 (1)
- March 2020 (2)
- February 2020 (1)
- January 2020 (1)
- December 2019 (1)
- November 2019 (3)
- October 2019 (2)
- September 2019 (3)
- August 2019 (2)
- July 2019 (3)
- June 2019 (5)
- May 2019 (2)
- April 2019 (4)
- March 2019 (2)
- February 2019 (2)
- January 2019 (4)
- December 2018 (2)
- November 2018 (2)
- October 2018 (1)
- September 2018 (2)
- August 2018 (3)
- July 2018 (3)
- May 2018 (2)
- April 2018 (5)
- March 2018 (5)
- February 2018 (2)
- January 2018 (4)
- November 2017 (2)
- October 2017 (2)
No Comments Yet
Let us know what you think