CHAID, less commonly known as Chi-Squared Automatic Interaction Detector can be a wonderfully powerful tool to use to help you to understand your customers. It can predict the future and help you to understand the past. So what is it and how can it be used?
According to Wikipedia:
Chi-square automatic interaction detection (CHAID) is a decision tree technique, based on adjusted significance testing. The technique was developed in South Africa and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction (in a similar fashion to regression analysis, as well as classification, and for detection of interaction between variables.
In practice, CHAID is often used in the context of direct marketing to select groups of consumers and predict how their responses to some variables affect other variables, although other early applications were in the field of medical and psychiatric research.
Like other decision trees, CHAID's advantages are that its output is highly visual and easy to interpret.
So, what does that mean in practice?
CHAID allows you to create groups of customers (segments) who have similar behaviour. It also allows you to understand which of the groups is most and least likely to do something. For example, I recently worked with a data analyst who built a model looking at the profile of openers v non-openers of a client's quarterly newsletter. The model divided the previous recipients into groups with the "best" group being 10x more likely to open newsletters than the "worst" group. That has allowed the client to decide whether to stop sending it to the worst groups or maybe create a different version for the under performing groups with a different message. The latter is what they have (correctly) chosen to do, as these non-responsive groups are still customers who have recently opted-in to receive the newsletter but for whatever reason are not typically opening them.
How did the project work?
- Firstly, we agreed which recent campaign we should analyse (we chose the newsletter because it was sent to the most people, giving us the largest sample size to analyse)
- Then, we asked the data team at the client to extract 2 files. One file was the people who had opened the newsletter and the other was non-openers (as this is a B2B client the total volume is low, so we chose to analyse all recipients in both groups.If we had more data, we could have taken a sample of responders and non-responders rather than the whole file)
- In the data file, we asked for every piece of information known about the clients who had been mailed (extracting every field on the database for every record). With CHAID you never know which data field will be the most predictive, so you ask for as much as you can get and let CHAID work out which variables are most useful
- These files were handed over to our data analyst, who ran the data through his SAS CHAID software. The output was a report with 3 key elements:
- A profile report comparing responders to non-responders for every data variable that we had available to us (eg country, job role, gender etc)
- A CHAID tree diagram showing each of the segments and for each group what % of them opened the email. Allowing us to see which was our worst and best groups
- A Gains chart which ranked the best to worst segments, so that we could easily see how predictive the model was and how much more responsive the best group was compared to the worst group
- These 3 things demonstrated to us that the model was highly predictive and allowed the client to very quickly see (in a highly visual format) who their best groups were and to understand the types of people who were not opening the emails.
- The next month, they tested a new angle for the least responsive groups. They did a simple A/B test where half got the same email as the best performing groups and half got a similar email with a different subject line. We are awaiting results of that trial now...
So, how could you use CHAID? In reality, as long as you have a previous activity to analyse, you can use it to predict anything such as:
- High v low value customers
- Loyal customers v those likely to switch to a competitor
- Good payers v bad payers
- Early v late adopters
- Attendees v non-attendees at events.....
The potential list is endless.
If you'd like to know more, let me know (email me at [email protected]) and I can discuss how you could use this massively under utilised statistical model.
Due to client confidentiality, I cannot show you my client's CHAID diagram. But, just so that you can get an idea of what the output looks like, here are a couple of example pictures I found online:
https://www.sciencedirect.com/science/article/pii/S2212571X1600007X
And this is what a gains chart might look like (again, one I found online):
https://www.researchgate.net/figure/Gain-plot-of-the-CHAID-and-CHAID-SPT-models_fig2_260062366