Tailor your marketing strategies with customer segmentation

Discover what your customers look like and what and how they like to buy at scale

9 min readFeb 15, 2023

Imagine operating a business with a diverse customer portfolio spread out geographically with differing income levels. Your customers purchase distinct products via various channels. As an operator, you want to fully understand what your customers look like and what and how they like to buy.

However, you have tens of thousands of customers in your database (lucky you!), and each has dozens of attributes like demographics and behavioral data. Neither Excel nor BI tools give you the necessary insights for this tremendous volume of data.

Here’s where machine learning-based customer segmentation comes in. With machine learning algorithms, you can uncover valuable insights about each customer segment at scale and apply the insights to business decisions.

In this article, I will take you through the customer segmentation journey step by step with a grocery store sample dataset.

From messy data to actionable insights

We are using a dataset from Customer Personality Analysis on Kaggle (CC0: Public Domain license) containing information on 2240 customers in 29 aspects. It describes customers’ demographics, financial situation, visiting behaviors, purchasing behaviors, and interactions with campaigns. Details of the 29 aspects are shown below.

Here are the insights we distill from customer segmentation of this seemingly daunting dataset. We identify four segments, and five main factors define each segment. The factors’ distribution is demonstrated below.

We discover what each segment looks like.

What they like to buy.

And how they like to buy.

Intrigued by the magic of customer segmentation? Let’s get started with the analysis!

Prepare the data

Before running machine learning, we need to examine whether the dataset is sufficient for our algorithms. Customer segmentation algorithms require standardized input data to avoid skewness in the model output.

Transforming the data

Based on initial experiments, demographic attributes like the year of birth, education and marital status, campaign interaction attributes, and some others are unrelated to the insights we are looking for. Therefore, we are dropping these columns. In practice, you will need to go through a few experiments to decide on input attributes for your models.

We also need to calculate a few percentages, such as the percentage of the amount spent on wine compared to the total amount spent, and add them to the dataset to allow for easy comparison.

Further, we need to standardize the dataset because each column’s magnitude varies.

After these steps, we get the following dataset for model input.

Evaluate sample adequacy

While we have reduced the number of columns from 29 to 16, there are still too many variables for us to analyze directly. Therefore, we need to “combine” different variables to narrow down a few factors with a process called “factor analysis.” However, before we perform factor analysis, we need to evaluate data adequacy.

We generally run Kaiser-Meyer-Olkin (KMO) test and Bartlett’s Test of Sphericity to understand the adequacy. As a rule of thumb, a KMO value > 0.6 is acceptable for factor analysis; the greater, the better. The algorithm calculates the KMO value of the example dataset to be 0.69. Bartlett’s Test of Sphericity tests the null hypothesis. P-value < 0.05 indicates that factor analysis is worthwhile for the dataset. The algorithm calculates the p-value of the dataset to be 0.0.

Therefore, we can run a factor analysis on the dataset.

Reduce dataset dimension

Decide number of factors

Next, we need to understand how many key factors we can get from the dataset by calculating the eigenvalue of the variables. According to Factor Analysis as a Tool for Survey Analysis, the eigenvalue of a factor represents the amount of dataset variance explained by the factor. An eigenvalue greater than one is considered significant.

Eigenvalues for the variables are shown below. We will have five factors in the following analysis.

Explore variable relationships

By running factor analysis, we will get the following graph. The cumulative variance of the first five factors is 0.61, which means the five factors can explain 61% of the dataset variance.

The graph below shows details of the factors. The amount of each variable indicates how much it influences a factor.

By sorting variables, we can see that for factor 0, the most influential variables are the total amount of purchases, the number of total purchases, and income. These variables all point to purchase volume. Therefore, we can name factor 0 to purchase_volume.

Repeating the analysis for the other four factors, we can name them wine_purchase, tech_maturity, meat_purchase, and catalog_purchase, respectively.

In practice, interpreting factors requires deep knowledge of business and industry. You will achieve the best outcome when data professionals partner with business stakeholders.

Analyze principle components

Principal component analysis (PCA) is a data science technique that reduces a large dataset’s dimension by transforming its variables into a few factors without losing much information. By running PCA with the results from factor analysis, we get the below dataset.

Clustering

Determine the number of clusters

The core algorithms of customer segmentation are clusterings. First, we will visualize cluster distribution with hierarchical clustering and dendrogram that shows relationships between similar sets of data.

Running algorithms on the above dataset, we get the dendrogram below. We can decide how many clusters we need by drawing a horizontal line like the dotted line below and moving it up and down.

Determining the number of clusters contains both science and art. We want to have a decent number of clusters so that each cluster represents a meaningful customer segment. However, too many clusters are overwhelming for analysis, making it impossible to make business decisions.

In this example, we choose to have 4 clusters. Next, we run k-means clustering to get 4 clusters.

Picture the customers

After all the hard work, we now have clean data to visualize the customers.

Customer overview

By plotting PCA data (graph 11), we get the graph below that shows the clusters’ factor distribution. Clusters distinguish from each other in characteristics, which is a good sign that we can probably distill useful information by further analyzing each cluster.

Customers in cluster 0 have high purchase volume and medium tech maturity and buy more wine and less meat compared to other clusters. We label this cluster the lifestyle buyer segment. Similarly, we label clusters 1~3 as high potential buyer segment, essential buyer segment, and casual buyer segment, respectively.

After appending the dataset in graph 6 with segment labels, we can visualize the distribution of important customer variables segment by segment.

What customers look like

Understanding customers’ demographics, such as income and household size, is always a good idea in customer segmentation.

The dataset’s income distribution is shown below (outliers have been excluded from the analysis). High potential buyers have the highest income, and lifestyle buyers come next.

Regarding households, lifestyle buyers have more teens at home than other segments.

What customers like to buy

Understanding what each segment likes to buy helps us decide how to target customers for specific products. In this example, purchase volume, wine purchase, and meat purchase are important segmentation factors.

The total amount purchase calculates a customer’s total spending on all products over the past two years. High potential buyers spend the most, and lifestyle buyers come close.

The original dataset contains purchase amounts of wine, meat, fruit, fish, sweet, and gold. The first two show clear patterns across segments. Lifestyle buyers purchase the most wine, and essential buyers come next.

On the other hand, lifestyle buyers spend the least on meat, and high potential buyers spend the most on this category.

How customers like to buy

Details about how and where customers like to buy help us determine the most effective way of selling specific products to each segment. This dataset describes three methods of purchases: web, store, and catalog.

Compared to the other three segments, high potential buyers purchase through the web less frequently.

On the other hand, essential and casual buyers are big fans of store purchases.

In catalog purchases, high potential buyers use the method very often, while customers in the other three segments only use it occasionally.

Make marketing decisions with data

The analysis above gives us a clear idea about each segment’s demographics, purchasing behaviors, and visiting behaviors. With that, we can make informed marketing decisions.

For example, when the grocery store imports high-quality wine, it can run a campaign towards wealthy lifestyle buyers who like to buy wine. Since lifestyle buyers often buy on the web, the grocery store best spreads product information via email or other online channels.

On the other hand, if the grocery store has an overstock of medium-quality wine, it can promote the wine to essential buyers who like to buy wine but are less affluent. This time, the grocery store has the best chance of selling the wine online and in-store.

Leveraging customer segmentation in real world

The real-world dataset can contain more customers and dimensions than the sample dataset here (how exciting!). Therefore, you can get richer insights and apply them to various aspects. For example, you can use the insights from wine to other lifestyle products.

After discovering reasonable customer segments, you can brainstorm the marketing strategies for each segment, including campaign, content, and creative strategies. Then you can upload segment and marketing strategy information to ad platforms to run programmatic marketing. Better yet, most large ad platforms allow you to find lookalike audiences. The more customer attributes you feed ad platforms, the more accurate lookalike prospects you will get.

One thing to note is that the steps described above involve a lot of experimentation, which is the nature of data science. The experimental process can be time-consuming and frustrating, but empowering precise marketing with customer segmentation is rewarding. So don’t give up if you haven’t gotten satisfying segments after the first few shots. Keep trying, and enjoy the fun of data science along the way.

I discuss how to use data science to level up your business and optimize your marketing in my articles. If you want to discuss customer segmentation or related marketing analytics topics, please follow me on LinkedIn or contact me at newsletter@ivyliu.io. Until next time.