Share this
What is data transformation and why it's vital to your analytics
by Crystalloids Team on Mar 1, 2019 11:59:42 AM
Data transformation is essential to every business intelligence process. The amount of data that companies collect nowadays is astonishing, but only 0,5% of this data is actually being analysed and used further for decision making.
To get a complete picture of business performance, including consumer behaviour and growth strategies, organisations need to connect and integrate all their data. Breaking down silos and having a single source of truth leads to better decision making, improved customer satisfaction and marketing success.
Big data is a big deal
Today, 76% of adults use three or more channels to make a purchase.** This constant access to digital content and information requires brands to act accordingly and offer their products via all channels possible: online, in-store, via an app or even via Facebook or Whatsapp.
But to process all this transactional data that appear in different formats and structure across the organisation depending on which platform or system collected them, companies need a careful data transformation strategy to ensure that data fits the needs of the business users.
What is a data transformation
So let's define what data transformation is. Also called the ETL process (extract, transform, and load), it is a technique during which data is being extracted from the data warehouse, converted into another format, and then loaded into the target place where data is being further analysed. During this process, a series of other actions might need to happen such as combining, refining or summarising data.
Extracting and transforming data affects other operations within the company. Because when data is changed to more readable, the data analysis can be performed faster and with more accuracy than before affecting not only other people's job productivity but overall business decision making." says Mengmeng, one of our data scientists.
Business analysts are often dealing with data that are complex and not immediately ready for analysis. In fact, 60-70% of their time is usually spent on transforming data from one format to another instead of evaluating, analysing and focusing on finding patterns. To avoid that, many modern tools that can help speed up the whole process.
Dataflow, Dataprep or Dataproc?
Traditional tools considered useful for big data applications are no longer sufficient and can take too long. Google Cloud Platform offers several options to transform data. From our experience we name a few and show you how they compare:
1. Cloud Dataflow is a fully-managed service for transforming and enriching batch and stream data. With Dataflow, more programming is required but saving the program, you can reuse it for the next solutions. For structured data, it is a great tool that loads directly to BigQuery, and this is where speed plays a crucial role:
When processing around 600 gigabytes on a traditional database like MySQL, it would take more than one minute to complete. With Cloud Dataflow it takes 10-30 seconds max. That is because Google BigQuery stores data column by column instead of row by row as it is with MySQL, so it takes much less time to process the data when selecting a couple of columns. Cloud Dataflow automatically scales up, it's fast, and that's what everyone wants," Mengmeng concludes.
Share this
- September 2024 (1)
- August 2024 (1)
- July 2024 (4)
- June 2024 (2)
- May 2024 (1)
- April 2024 (4)
- March 2024 (2)
- February 2024 (2)
- January 2024 (4)
- December 2023 (1)
- November 2023 (4)
- October 2023 (4)
- September 2023 (4)
- June 2023 (2)
- May 2023 (2)
- April 2023 (1)
- March 2023 (1)
- January 2023 (4)
- December 2022 (3)
- November 2022 (5)
- October 2022 (3)
- July 2022 (1)
- May 2022 (2)
- April 2022 (2)
- March 2022 (5)
- February 2022 (3)
- January 2022 (5)
- December 2021 (5)
- November 2021 (4)
- October 2021 (2)
- September 2021 (2)
- August 2021 (3)
- July 2021 (4)
- May 2021 (2)
- April 2021 (2)
- March 2021 (1)
- February 2021 (2)
- January 2021 (1)
- December 2020 (1)
- October 2020 (2)
- September 2020 (1)
- August 2020 (2)
- July 2020 (2)
- June 2020 (1)
- March 2020 (2)
- February 2020 (1)
- January 2020 (1)
- December 2019 (1)
- November 2019 (3)
- October 2019 (2)
- September 2019 (3)
- August 2019 (2)
- July 2019 (3)
- June 2019 (5)
- May 2019 (2)
- April 2019 (4)
- March 2019 (2)
- February 2019 (2)
- January 2019 (4)
- December 2018 (2)
- November 2018 (2)
- October 2018 (1)
- September 2018 (2)
- August 2018 (3)
- July 2018 (3)
- May 2018 (2)
- April 2018 (5)
- March 2018 (5)
- February 2018 (2)
- January 2018 (4)
- November 2017 (2)
- October 2017 (2)