What is data transformation and why it's vital to your analytics

What is data transformation and why it's vital to your analytics

Data transformation is essential to every business intelligence process. The amount of data that companies collect nowadays is astonishing, but only 0,5% of this data is actually being analysed and used further for decision making.

To get a complete picture of business performance, including consumer behaviour and growth strategies, organisations need to connect and integrate all their data. Breaking down silos and having a single source of truth leads to better decision making, improved customer satisfaction and marketing success.What is data transformation and why it's vital to your analytics

Big data is a big deal

Today, 76% of adults use three or more channels to make a purchase.** This constant access to digital content and information requires brands to act accordingly and offer their products via all channels possible: online, in-store, via an app or even via Facebook or Whatsapp. 

But to process all this transactional data that appear in different formats and structure across the organisation depending on which platform or system collected them, companies need a careful data transformation strategy to ensure that data fits the needs of the business users. 

What is a data transformation

So let's define what data transformation is. Also called the ETL process (extract, transform, and load), it is a technique during which data is being extracted from the data warehouse, converted into another format, and then loaded into the target place where data is being further analysed. During this process, a series of other actions might need to happen such as combining, refining or summarising data.

Extracting and transforming data affects other operations within the company. Because when data is changed to more readable, the data analysis can be performed faster and with more accuracy than before affecting not only other people's job productivity but overall business decision making." says Mengmeng, one of our data scientists. 

Business analysts are often dealing with data that are complex and not immediately ready for analysis. In fact, 60-70% of their time is usually spent on transforming data from one format to another instead of evaluating, analysing and focusing on finding patterns. To avoid that, many modern tools that can help speed up the whole process.

Dataflow, Dataprep or Dataproc?

Traditional tools considered useful for big data applications are no longer sufficient and can take too long. Google Cloud Platform offers several options to transform data. From our experience we name a few and show you how they compare:

1. Cloud Dataflow is a fully-managed service for transforming and enriching batch and stream data. With Dataflow, more programming is required but saving the program, you can reuse it for the next solutions. For structured data, it is a great tool that loads directly to BigQuery, and this is where speed plays a crucial role: 

When processing around 600 gigabytes on a traditional database like MySQL, it would take more than one minute to complete. With Cloud Dataflow it takes 10-30 seconds max. That is because Google BigQuery stores data column by column instead of row by row as it is with MySQL, so it takes much less time to process the data when selecting a couple of columns. Cloud Dataflow automatically scales up, it's fast, and that's what everyone wants," Mengmeng concludes.

2. Cloud Dataprep is a fast cloud data transformation service which can be used when dealing with files directly. For non-programming data analysts, this is an excellent option as it is visually very user-friendly and requires no code. But for complex data transformation, it is better to use Dataflow. 
3. Cloud Dataproc is an intelligent data tool which is great for data scientists when running Apache Spark and Apache Hadoop jobs. It can be used for legacy jobs. Dataproc is similar to Dataflow so to decide which one you need to use Google provides a handy diagram:
What is data transformation and why it's vital to your analytics
Big data is a big business. But without having the right processes in place, it has no value. Data must be taken care of, be carefully extracted, cleansed and refined to deliver maximum value to the organisation. Understanding the importance of data transformation will enable you to unleash the insights from your data and grow your business.
*Think with Google, June 2017 
** Think with Google, November 2017 



No Comments Yet

Let us know what you think