Share this
How to create a customer data model?
by Jan Hendrik Fleury on Apr 1, 2022 12:16:09 PM
Data is taking a predominant role in the way we do business today. Everybody is trying to jump on the data-driven wagon. The reason for this general agreement being data is a valuable asset that can drive the best possible decision-making for any company.
Creating a customer data model is one of the 5 foundations for creating the modern data stack and democratizing analytics. The other 4 foundations are data ingesting including event collection, visualizing, and applying intelligence with AI and the last step is the activation of the decisions.
Making data useful to your teams and systems is being achieved by creating a customer data model that is the second foundation after data ingesting. A prerequisite is that you will bring (or already have) your customer data in raw form in your data warehouse or data lakehouse.
This is the part of the problem where a good architectural design will make you happy and successful for a long period of time! The way to go is to organize the raw data into actionable models or entities that work for your business use cases. This stage of the stack can involve two key components:
- Identity Resolution: Identifying the same users in different data sources
- Master Data Model: Creating a final/clean view of your customers and associated facts and dimensions.
A piece of advice to start with is to start small in most cases. Building models incrementally and iterating quickly will get you to real value sooner. One approach that I’ve seen works well is to start by creating business objects needed for activating growth within marketing and sales: customers, transactions, and events.
Identity resolution
This first step is building out the identity graph of clients and users in a global identifier linking (also called stitching by many) all customer interactions with your channels and applications.
The 3 key steps for SQL based simple identity resolution in your Data Warehouse:
- Identify match keys
This is in order to determine which fields or columns you’ll be using to determine which individuals, or subsidiary companies, are the same within and across sources. A typical example of match keys might be an email address and last name.
- Aggregate customer records
The second step is to create a source lookup table that has all the customer records from your source tables.
- Match and assign a Global Customer ID
The final step is to simply take records that have the same (or in some cases, similar) match keys and generate a unique customer identifier for that matching group of customer records in your company level. I call this a Customer ID. Each customer ID that is generated can be used to link the customer sources together going forward. What this looks like in practice is described in my blog about first-party data strategy at the header ‘using first-party data in a CDP’.
As you add more sources you can start rolling them into the same process by setting the correct rules and precedence for the source.
Here's an ERD for what the process can look like in a sample Implementation:
Creating master data models
For creating your first customer view, you’ve got the first problem of identity solved. This is followed by getting your data pipelines or ETL processes in place to build your master models that will drive analytics.
To drive quick value I recommend starting with a “Customer→Transaction →event ” framework. This framework is creating the three key objects from your source data.
The image below shows what this type of modeling looks like.
Customers: Table of your customers with the ability to quickly add new fields add new fields
Transactions: Join key from customers table to their transaction history, including product returns for a retailer for example
Events: Any event you track from each customer
If your company is a marketplace or has different business identities, you can change these master data models to follow what makes sense for you. For example, for a double-sided marketplace, you might have tables for both sellers and buyers as different entities. For a B2B business, you might have separate accounts and contacts entities.
Tools
There are several ways to ingest data and transform it into your data warehouse. I would in most cases advise analytics teams to adopt an open-source-based solution. Over the past few years, open-source tools for creating the modern customer data stack / CDP have been making managing and maintaining your data easier whilst reducing costs significantly.
Most used in open source are Airflow and Beam. Airflow shines in data orchestration and pipeline dependency management, while Beam is a unified tool for building big data pipelines.
When it comes to workflow orchestration, Airflow has been widely used in the space for running data pipelines or machine learning models. At Crystalloids, we deploy both Airflow and Beam with Cloud Composer and Cloud Dataflow.
It can also be deployed as a managed service on AWS. Other alternatives to Airflow include Preft, Dagster, Kubeflow, and some more.
If you have the structured data sitting in your data warehouse, you can also write all your transformations in SQL. You may be thinking now how is it possible to manage all these transformations when they scale up to the hundreds? Saving hundreds of SQL queries in some folders is not really an easy thing to maintain, right? What about if I want to update one of those transformations? Or roll back to a previous version of one of my SQL scripts? If you are already at this stage, then dbt (Data Build Tool) will be your friend. dbt will help you manage all of this complexity just by integrating some practices like documentation, transformation/model lineage (i.e. which transformation goes first), data testing (i.e. weird to have transactions with negative values), and some nice version control with Git to make sure that you have everything in one place and you can track versioning.
CI/CD
The source for the cleaned-up model is (BigQuery) views, which in turn are versioned in a versioning system such as Cloud Source or Bitbucket. This is the starting point of a CI/CD workflow, where adjustments to this model are guided through the DTAP street in a controlled manner (and with the right approvals). Google Cloud build plays an important role in this.
Conclusion
Creating a customer data model is one of the 5 foundations for creating a modern data stack. If you would need assistance in creating the elements of modeling as described in this blog, feel free to contact me by scheduling a meeting with this scheduler.
ABOUT CRYSTALLOIDS
Crystalloids helps companies improve their customer experiences and build marketing technology. Founded in 2006 in the Netherlands, Crystalloids builds crystal-clear solutions that turn customer data into information and knowledge into wisdom. As a leading Google Cloud Partner, Crystalloids combines experience in software development, data science, and marketing, making them one of a kind IT company. Using the Agile approach Crystalloids ensure that use cases show immediate value to their clients and free their time to focus on decision making and less on programming
Share this
- August 2024 (1)
- July 2024 (4)
- June 2024 (2)
- May 2024 (1)
- April 2024 (4)
- March 2024 (2)
- February 2024 (2)
- January 2024 (4)
- December 2023 (1)
- November 2023 (4)
- October 2023 (4)
- September 2023 (4)
- June 2023 (2)
- May 2023 (2)
- April 2023 (1)
- March 2023 (1)
- January 2023 (4)
- December 2022 (3)
- November 2022 (5)
- October 2022 (3)
- July 2022 (1)
- May 2022 (2)
- April 2022 (2)
- March 2022 (5)
- February 2022 (3)
- January 2022 (5)
- December 2021 (5)
- November 2021 (4)
- October 2021 (2)
- September 2021 (2)
- August 2021 (3)
- July 2021 (4)
- May 2021 (2)
- April 2021 (2)
- March 2021 (1)
- February 2021 (2)
- January 2021 (1)
- December 2020 (1)
- October 2020 (2)
- September 2020 (1)
- August 2020 (2)
- July 2020 (2)
- June 2020 (1)
- March 2020 (2)
- February 2020 (1)
- January 2020 (1)
- December 2019 (1)
- November 2019 (3)
- October 2019 (2)
- September 2019 (3)
- August 2019 (2)
- July 2019 (3)
- June 2019 (5)
- May 2019 (2)
- April 2019 (4)
- March 2019 (2)
- February 2019 (2)
- January 2019 (4)
- December 2018 (2)
- November 2018 (2)
- October 2018 (1)
- September 2018 (2)
- August 2018 (3)
- July 2018 (3)
- May 2018 (2)
- April 2018 (5)
- March 2018 (5)
- February 2018 (2)
- January 2018 (4)
- November 2017 (2)
- October 2017 (2)
No Comments Yet
Let us know what you think