In January of 2019, I started my journey as a freelance marketing analytics consultant. The first two months I spent diving deep into the Google Cloud Platform to try and figure out if this technology could solve the main problems that most of my clients had:
“How to do analysis across the walled gardens of digital advertising tools and across an increasing range of customer interaction platforms.”
Over the past 1.5 years I’ve been working, together with my partner in crime Hussain Mehmood and the MarketLytics team, to solve this question and questions like it for my clients leveraging Google Cloud Platform by building marketing data warehouses in BigQuery. This blogpost describes our vision and our learning so far working on projects for clients like RVShare.com, MessageBird.com and Temper.works.
The 3 Main Challenges in Digital Marketing
- Measure user interaction across different apps and platforms
As if it wasn’t hard enough to implement proper tracking for websites, most businesses now have multiple platforms (web, app, interactive campaigns) where consumers interact with them. Measuring across them is essential when making decisions about whether something is successful or not.
- Measure advertising interaction and successfully attribute across walled gardens of advertising platforms
As long as we do not measure across the walled gardens of Google & Facebook (and others) to figure out the entire customer journey and the intertwined interactions they take towards purchasing with us, we are stabbing in the dark when trying to optimize our campaigns for effectiveness.
- Be able to timely activate insights from data back into platform and marketing interactions with customers
Insights ≠ value. Without turning insights into (preferably automated) action, it’s hard to actually profit from our data infrastructure. You need to be able to trigger actions based on data points when it’s actually relevant to the consumer.
The Essential Building Blocks for Marketing Success
In order to tackle these challenge successfully, there are a couple of things your organization should try to build in one form or the other.
- A way to extract data from all your platforms and tools
- A place to store all this data
- A way to merge relevant data and query across different datasets
- A way to automatically activate marketing actions based on data signals
Obviously, the right people with the skillsets to execute (or at least manage and hire-in) these processes are essential to all of this, but not focussed on in this post.
What you will likely end up with, is a data infrastructure that looks like the image below.
What is Google Cloud Platform and Why do we prefer it?
Google’s Cloud Platform is the collection of services and tools that Google has produced for internal usage and opened up for developers to use and build upon. In basic terms: it solves a lot of the complex problems that are required when running software and allows developers to focus on their core app more and less on the infrastructure behind it.
Popular competitors are Amazon’s AWS and Microsoft Azure. The reason Google Cloud Platform is (more) interesting for us as marketers, in our opinion, is three-fold:
- Google offers a lot of native connectors between their advertising suites and their cloud environment.
- Google’s identity graph allowing you to match your own 1st party data to google’s data about users for better targeting.
- Google’s machine learning API’s are geared more towards marketing applications out of the box.
Google’s BigQuery is the data warehouse of choice for most organizations dealing with marketing data. The reasons are (among others):
- Storage is virtually limitless (user interaction data can be very large)
- Cost of storage is really low
- Querying large datasets is effortless and fast
- Frequent updating of datasets is easy and fast
- Ad-Hoc analysis is very easy
Extract, Load & Transform instead of Extract, Transform & Load.
In the past, people used to mention “ETL processes” when talking about streams of data. What they were talking about was the process of extracting data, transforming it into the desired format, and loading it into the desired tool (for instance the data warehouse). Thanks to the cost of storage data dropping and the ability to query large datasets increasing, in most cases it now pays off to opt for a ELT, instead of ETL strategy.
Extract and load the raw data into your data warehouse first (often, the raw ingestion part is dubbed data lake) and handle the transformation after that. You maintain your raw data ingestion tables (available to repurpose later on) and shift the most intensive part of the chain (transformation) to the end.
Your Data “Lake” and “Warehouse” in the same tool
Thanks to the cost structure for data storage and the many ways to easily ingest data, BigQuery is an interesting place to store both your “raw ingestion data” as well as your transformation and final data tables. This is interesting for multiple reasons:
- You limit the dreaded “silo’s” within organizations. By having everything available in one platform, doing analysis across datasets becomes so much easier.
- Storing raw data in your data lake allows you to re-run analysis or build other applications on top of that data in later stages. One of the hardest things is to know what analysis you want to do, before collecting and exploring the data. By storing the raw data this is now a problem of the past.
- Having your final data tables and transformation tables available in the same tool saves a lot of headaches trying to debug issues.
Whenever you spot a problem in the final product (like a dashboard), you need to figure out where this problem occurred in your setup. By having the entire flow available in one tool (BigQuery) this is a lot easier to do.
Extracting Data from Marketing Tools & Platforms
The first thing you’re going to run into when building your marketing data warehouse is the challenge of extracting data from all different sources.
Depending on your marketing stack, you’ll likely want to export advertising data from Google and Facebook, have your analytics data from both web and apps, have a CRM dataset, a transactional dataset and on top of that additional marketing tools like your email marketing suite.
All of these platforms have data in them, and we need to get that data out. Furthermore, we need to continuously get updates from these platforms whenever data is added or altered.
The first step in creating the architecture of your marketing data warehouse should be to map out your marketing stack and identify the “schema” of the data that can be exported out of each tool. (Here is an example of the documentation of the Google Ads data export schema.) The main things you are looking for are:
- A “Key” to match this data to other datasets.
This could be a user ID to match sessions from a user across devices or platforms. Or a click ID to match a users’ impressions and click to their session behavior in analytics.
- The level of granularity available.
Most tools do not offer a ‘raw data export’. Both because it would be costly to do, but also because it would likely violate privacy agreements. In most cases, you’ll be forced to do aggregation on a certain level (for instance, per day). Understanding this is crucial when thinking about potential use-cases for the data in the future when building applications to activate your marketing campaigns based on this data.
Identifying Users across Sessions, Devices & Platforms
User recognition is essential for marketing. After all, we want to understand what marketing messages a person has received in order to try and optimize our marketing campaigns for return on investment. But in the current technological landscape there are many issues making proper user identification hard:
- Consumers use multiple devices.
- Consumers use multiple browsers.
- Privacy-enhancing features make not-logged-in traffic tracking harder.
- Stitching historic data to currently identified data is usually not possible in off-the-shelve tools.
Building your 1st Party User Graph
Within your data warehouse, you get the freedom to create your own user graph which allows you to make this information as complete as possible. You should be able to create a table in your dataset which links all anonymous ID’s from past usage to one central user ID that you assign to a user.
Once you have your user graph logic in place, it becomes really interesting to ‘discover additional devices’ from your customers. For instance, if you have your customers’ email address via a purchase that happened on their desktop, you could probably stitch their mobile device into the user graph by leveraging email marketing (which is often opened on mobile) in a smart way.
Companies that are able to identify their users’ devices and stitch sessions together based on owned (first party) data position themselves to continue doing effective digital marketing while privacy enhancing features on devices and browsers increase.
Tying Advertising data to Interaction data
When evaluating your advertising campaigns, it is essential to connect the right “clicks” to the right “sessions”. When stitching this data together, you’ll quickly discover the value of adding ‘dynamic values’ to your links (besides the default UTM tracking).
Both Facebook and Google support these dynamic link decorations which allow you to pass along a unique value per “Campaign”, per “Adset” and per “Ad”. Adding these as dynamic parameters to your links when ads are clicked allows you to match these to the session which has these parameters in their landing page URL.
Attribution Modelling in Marketing Data Warehouse
When doing attribution modeling, you are now able to aggregate per campaign/adset/ad per specific date and tie the Impressions, Clicks and Cost together with the Sessions and Conversions that are tied to it.
The flexibility of creating your own attribution dataset allows you to solve a couple of really interesting topics, like:
- Attribution across different advertising networks (walled gardens) and across different owned platforms (app, web, etc)
- Ability to extend your “lookback window” for attribution beyond what normal tools allow you to do.
- Ability to update historic data once your user graph expands, making attribution more precise over time.
- Ability to apply machine learning algorithms instead of manual ‘hand-made’ attribution models.
Visualizing Data and Extracting Insights
One of the first things you will probably want to do with the data in your marketing data warehouse is visualize it for further analysis. Especially marketing attribution dashboards that work across all of your advertising tools and platforms are a unique opportunity which is impossible to achieve without a marketing data warehouse.
There are a lot of visualization tools out there, and most of them have a native BigQuery connector. If you don’t have a tool yet, Google’s free tool called Google Data Studio is a great tool to start your journey with.
Avoid staring at numbers for the sake of it
As I’ve described in earlier blog posts, humans have the tendency to want a lot of data for the wrong reasons. Most of the time, to look smart and blame something when a decision turns out bad. Dashboards can be very valuable, but can also drain a lot of time from your organization if not used effectively.
The main question to keep in mind when creating a dashboard is: “What decision/action are you trying to make and how often do you need to make it?”
Ideally, each part of information on your dashboard should have an answer to that question attached to it.
Spreading the ability to create dashboards
The technical act of creating a dashboard is becoming easier every day with intuitive tools like Google Data Studio. This allows everybody, not just analysts, to create dashboards for their own usage. Although this “democratization of data” is probably a good thing in the long run, you should probably take some precautions like:
- Only share “final tables” of data with non-analysts to prevent people from using the wrong datasets in their dashboards.
- Create a common naming convention of data points to use in these final tables and document that somewhere where everybody can access it.
- Document the limitations of datasets (for instance maximum granularity to use) and holes/quality of datasets.
Taking (Automated) Action upon Data
Only action (or purposeful in-action) can generate value. Once you have defined what actions to take upon the data you collect you now have the ability to make the data work for you. Because you’ve created your own marketing data warehouse, you now have the ability and flexibility to design the actions you want to take. Google Cloud Platform offers a variety of tools to make this possible like Google Cloud Functions.
Here are 3 interesting ideas to get you started, but obviously the more specific and tailored to your business they are the more valuable they will get.
Personalized Email Marketing with Recommended Items based on previous browsing behavior
Once you have your User Graph completed and you collect “Product Impressions”, “Product Add To Cart” and “Product Purchases” you now have all the ingredients to let Google’s Recommendations API serve you with the most likely products a user will purchase based on machine learning. You can feed this information into your email marketing tool of choice to generate personalized email marketing based on your product feed.
Create your own Cross & Up-sell module using Recommendation API
Using the same API as mentioned in the previous tip, you can create your own logic to decide which product SKU’s to show in the upsell and cross sell areas in your webshop.
Apply Machine Learning to your Attribution Models
Leverage google’s Tensorflow to apply propensity modelling to your attribution data instead of using arbitrary man-made attribution models like last- or first-click.
Building a marketing data warehouse is a challenging and interesting project to work on with a high (potential) reward if done and maintained the right way.
If you want to build your own marketing data warehouse, I recommend you to start with these 3 things:
- Map out at least 3 use-cases with the final data that you think will be worth your while.
- Do a thorough inventory of your current marketing stack and review their documentation when it comes to ‘exporting data’.
- Create a schematic overview of your system (using a tool like Miro).
Once you have thought these 3 things through, you should have a decent picture of the amount of complexity involved and the potential upside you can achieve by building a data warehouse for marketing. Armed with these 3 things, you have a great starting point to talk to an internal data engineer or external consultant who can help you proceed.
Interested in working with us?
Do you think your organization needs a marketing data warehouse? Need outside help to build (parts of) it? Let’s have a chat!