What is data transformation and why it's vital to your analytics
by Veronika Schipper, on Mar 1, 2019 11:59:42 AM
Data transformation is essential to every business intelligence process. The amount of data that companies collect nowadays is astonishing, but only 0,5% of this data is actually being analysed and used further for decision making.
To get a complete picture of business performance, including consumer behaviour and growth strategies, organisations need to connect and integrate all their data. Breaking down silos and having a single source of truth leads to better decision making, improved customer satisfaction and marketing success.
Big data is a big deal
Today, 76% of adults use three or more channels to make a purchase.** This constant access to digital content and information requires brands to act accordingly and offer their products via all channels possible: online, in-store, via an app or even via Facebook or Whatsapp.
But to process all this transactional data that appear in different formats and structure across the organisation depending on which platform or system collected them, companies need a careful data transformation strategy to ensure that data fits the needs of the business users.
What is a data transformation
So let's define what data transformation is. Also called the ETL process (extract, transform, and load), it is a technique during which data is being extracted from the data warehouse, converted into another format, and then loaded into the target place where data is being further analysed. During this process, a series of other actions might need to happen such as combining, refining or summarising data.
Extracting and transforming data affects other operations within the company. Because when data is changed to more readable, the data analysis can be performed faster and with more accuracy than before affecting not only other people's job productivity but overall business decision making." says Mengmeng, one of our data scientists.
Business analysts are often dealing with data that are complex and not immediately ready for analysis. In fact, 60-70% of their time is usually spent on transforming data from one format to another instead of evaluating, analysing and focusing on finding patterns. To avoid that, many modern tools that can help speed up the whole process.
Dataflow, Dataprep or Dataproc?
Traditional tools considered useful for big data applications are no longer sufficient and can take too long. Google Cloud Platform offers several options to transform data. From our experience we name a few and show you how they compare:
1. Cloud Dataflow is a fully-managed service for transforming and enriching batch and stream data. With Dataflow, more programming is required but saving the program, you can reuse it for the next solutions. For structured data, it is a great tool that loads directly to BigQuery, and this is where speed plays a crucial role:
When processing around 600 gigabytes on a traditional database like MySQL, it would take more than one minute to complete. With Cloud Dataflow it takes 10-30 seconds max. That is because Google BigQuery stores data column by column instead of row by row as it is with MySQL, so it takes much less time to process the data when selecting a couple of columns. Cloud Dataflow automatically scales up, it's fast, and that's what everyone wants," Mengmeng concludes.
Crystalloids helps companies improve their customer experiences and build marketing technology. Founded in 2006 in the Netherlands, Crystalloids builds crystal-clear solutions that turn customer data into information and knowledge into wisdom. As a leading Google Cloud Partner, Crystalloids combines experience in software development, data science, and marketing making them one of a kind IT company. Using the Agile approach Crystalloids ensures that use cases show immediate value to their clients and make their job focus more on decision making and less on programming.
For more information, please visit www.crystalloids.com or follow us on LinkedIn.