Cohort Analysis: An Overview

A cohort is simply a group that has something in common. You can create these groups in many different ways, whether based on time (when the signed up), behavior (things they do or don’t do), value (the type of plan that they are on), and much more. 

A cohort analysis is a method of visualizing and comparing how a cohort behaves over time. Measuring engagement over time helps you identify trends, understand user behavior, and most importantly can guide you to make improvements in your product, marketing, onboarding, etc. 

Your cohort analysis can shed light on how a given cohort behaves over time, and how those behaviors differ between groups. This will help uncover trends, understand user behavior over time, and understand customer and revenue retention over time.

For example, Is user engagement improving over time, or is it actually decreasing over time, but appears to be improving due to new user growth? Cohort analyses can help uncover these answers.

How Can A Cohort Analysis Be Helpful?

Cohorts hold great value to SaaS companies to understand how different groups of customers behave over time. This can help you understand your customer’s journey from sign up through churn, and how your user behavior impacts conversions, retention, churn, and revenue.

Data and trendlines can be misleading if only looked at in aggregate at a high level.

For example, if you are seeing a continued up-and-to-the-right trend of total customers, you may rightfully pat yourself on the back for continued growth. More customers is always good, right? However, without analyzing the composition of those customers, you can not be certain that you are just stacking on a lot of first-time new customers while losing customers to churn on the other end.  That would not be a sustainable path to growth.

Enter the cohort analysis. The cohort analysis to track the behaviors of different customer segments over time, so you will be able to understand your business performance as a whole. 

By looking at metrics like your churn rate in a cohort analysis as it changes over time, you can answer critical questions like:

  • How long do customers stick around, on average?
  • When in the customer life cycle do customers churn out?
  • Is there a point when churn actually stabilizes?
  • Are there any common characteristics across those who have better retention? Or those who have a higher churn rate?

From here, you can segment your data and create cohort analyses to dig in to these questions and uncover the answers and create the appropriate strategies.  

How Do You Read A Cohort Analysis?

There is no doubt that upon first glance, a cohort analysis can be confusing and unintuitive. There are several variations to the cohort analysis, but let’s start with a simple one that is often used by SaaS companies to analyze churn and retention.

Retention Rate by sign up date, cohort analysis via Mode.

  • Each row going down represents a cohort of users grouped by when they signed up. In the example above, cohorts are grouped by the week that they signed up, but you can group by daily, weekly, monthly, quarterly etc, depending on what makes most sense. 
  • Going across, the columns represent the retention rates by the weeks after the cohort signs up.
  • Within each cell, the percentage represents the amount of the original cohort that is still active (this table is looking at user retention, but “activity” can be defined by whatever metric you are measuring). 
  • The color shading ranges from dark green (highest retention) to red (lowest retention).

Reading across a row, we are looking at the retention of a cohort who signed up in the week of 4/28 and the life span of that cohort. Where are there any significant drops in retention? At first glance, retention rates start at a relatively low point, in the high teens and low 20 percent range.

If we read down the column, we are looking at retention rates for a given week in the life of a customer. Ideally this number would be increasing over time, to indicate that retention is improving over time.

How Should I Segment My Cohorts?

Now that you have identified the questions that you are looking to answer, and you have your data set, how can you segment your users?

There are two primary ways to group cohorts: by acquisition date, and behavior. Let’s take a look at each:

Acquisition Cohorts: These groups are usually going to be based on the day, week, or month that your user signed up. The benefit of starting with an acquisition date is that it allows you to see how long your customers are sticking with you before they churn out. Therefore you can get an estimate of customer lifetime value (we covered Lifetime Value, aka LTV, in depth here). This is critical for you to set your bench mark for how much you can spend to acquire a customer profitably.

Behavioral Cohorts: The subsequent step after creating time-based cohorts is to group your users based on the actions that they did or did not take within the same time frame. For example, if you are a project management tool like Trello, behavioral cohorts could be based around creating tasks, inviting collaborators, upgrading to premium features or using various integrations with other products like GitHub.

Identifying the different behavioral cohorts at this stage is a combination of formulating different questions or hypotheses, probing into the data, and teasing out any impactful differences.

Ultimately, your goal is to identify some combination of action or user property that leads to greater engagement and retention.

Creating A Retention Curve

A retention curve is the best way to visualize the retention of specific cohorts over time. The drops in customer engagement or retention become very apparent if you look at how users behave over time in a retention curve.

For example, here is a cohort analysis of retention by user device, provided by data science platform Mode Analytics:

Mode Analytics

This is a nice visualization of how users stay engaged over time, as segmented by device type. The conditional formatting of colors help the story speak for itself, and leave clues of how where you can run further analysis.

From here, you may want to pull out specific cohorts, and see the visualization of retention. So I just exported the data (which you can see here.) and plotted retention based on the device that users have. The y-axis represents retention, and the x-axis represents time.

The result is an apples-to-apples comparison of tablet users and how their retention differs over time. A few quick observations:

  • Is the onboarding of the Galaxy Tablet (the red line) different in any way? Engagement is noticeably higher to begin with, and then nearly doubles in the third week remains higher through Week 8.
  • Each device has a bump in engagement weeks after their sign up–what is driving this engagement? Onboarding emails, reaching an Aha-moment, completion of some onboarding process?
  • While Tablet and Note users seem to level off their engagement, the iPad Mini users seem to plummet to nearly zero engagement by Week 11. Is there something within the iOS vs. Android platform, or even just iPads that drives this? And what happens after Week 12? More data would be helpful here.

These are just a few example questions of how a Retention curve and cohort analysis can spur on more analysis that surfaces interesting action-oriented outcomes.

Steps to Creating An Insightful Cohort Analysis

How do you set up your analysis to deliver actionable insights? Let’s look at creating a process to be systematic and data driven in the process.

Here is a process that you can use for your next cohort analysis:

  • Decide on your questions: What are the questions that you are trying to answer? You want these to be critical questions that can drive you to action. Some good starting points:
    • When (in a customer’s lifetime) do the biggest drops in engagement occur?
    • At what point does churn taper off and “flatline”? In other words, what percentage of customers can be expected to be engaged and retained for the long term? 
    • How long do customers remain engaged and paying customers, ie what is the customer lifetime?
    • What actions do those with the best retention all have in common?
    • What actions (or inactions) do those with the lowest retention have in common?
    • Has retention improved over time for any particular cohorts? If so, at what point in time has that occured, and what is the cause of it?
  • Decide On the Appropriate Metrics Once you have an idea of the questions that you are answering, you can zero in on segmenting your data appropriately.
  • Create Your Cohorts: How are you defining your cohorts–is it an Acquisition cohort (in which case you are grouping based on when they were acquired) or a Behavioral cohort (where you would group based on the actions and behavior of the user)?
  • Define Time frame: The increments of time that you measure your cohorts by is dependent on the trends you are observing, the type of product that you have, and the cohorts that you are observing. For example, B2C mobile apps tend to have far shorter retention curves (ie users disengage and churn much faster than B2B apps for example), so looking at your cohort analysis on a daily basis may make sense. Or if you are an enterprise software, you may look at cohorts on a quarterly or annual basis. Of course you can adjust the time frames very easily with all analytics platforms to tease out any insights.
  • Analyze Your Resulting Data: Ideally you have well-defined questions that you are seeking answers to, and you were able to create a cohort analysis that pulls together relevant data. Now time to feast on some data analysis! Are you able to notice any trend lines or patterns? Is there a change in the cohorts that could be correlated (or caused) by things that you did?

A Deep Dive Into Cohort Analysis

Your cohort analysis can point you in the direction of important observations, but it is imperative that you see your analysis as a jumping off point for further investigation. You don’t want to necessarily pull concrete conclusions from your cohort analysis. In other words, just because you are noticing something in the data does not necessarily mean that you have identified the cause.

Enter: the nuance between correlation vs. causation.

For example, again let’s assume that we are at project management tool Trello, and analyzing retention with cohort analysis. We notice that those who invite and share projects with others have higher retention rates. Can we therefore conclude that multiple users in a project is the key to retention, and therefore we simply aim all efforts to getting more people in a project? It is unclear of having people on a project is causing the higher retention, or that those additional users join because they use another app that is integrated with Trello.

In order to determine correlation or causation, we will need to test different experiences with different user groups, and determine whether inviting others to collaborate drives greater retention. Ultimately, we can systematically and iteratively test our way to pinpointing the specific behaviors that increase retention. 

Variations on Cohort Analysis

There are myriad ways to visualize cohort data, some more helpful and insightful than others. The most important thing to keep in mind is how much any given visualization can lead you to acting on the data.

Opting for simplicity is a good policy here. Here a few visualizations that you may come across:

Layer Cake

The layer cake visualizes the revenue contribution of a cohort over time. The main takeaway is how cohorts contribute over time, and what percentage of revenue comes from a new customer cohort vs. returning customers. Thus, it is easier to see the underlying revenue streams that contribute to top line growth.

Source: AnalyzeCore

New Vs. Returning Revenue

The less colorful relative of the layer cake is created by simply segmenting New and Returning customers. It will look something like this:

This can quickly show some good high level trends:

  • total revenue over time
  • revenue from new customers
  • revenue from existing customers
  • acceleration or deceleration of either cohort

From here, you can choose to drill down further if there are any interesting trends that deserve a deeper investigation.

Cycle Plots

Cycle Plots are a good option to see how cohorts behave, and retain, over time. The concept is that you are grouping the different cohorts by their behavior after each week or month of acquisition. Note that cycle plots are only applicable for cohorts that are grouped by date of acquisition.

Here is an example of what a cycle plot looks like:

Source: Amplitude

In the example above, we can connect the points for retention for those who signed up in January, February, March, and April. Is retention higher for the April cohort? The ideal outcome, which we see above, is that the retention curve “shifts up” so that each month’s retention gets progressively better.

Conclusion

As you can see, a cohort analysis is incredibly useful for visualizing where shifts in behavior occur, and identifying the underlying drivers of that change in behavior. Sometimes top line growth can mask underlying problems that cause churn. Cohort analysis is your go-to tool to reveal the stickiness of your users and the details of the customer journey.

Many companies today place a premium on “growth”, as defined by new user acquisition. However, this is only part of the growth equation. The other less glamorous part of the equation relies on retention, and how many of the users you acquire stick around, and specifically how long they stick around for.

So before you go full-throttle growth mode and drive user acquisition, use the steps outlined above to understand your users and their behavior by cohorts.

This not only helps you probe deeper into your analysis, but can help you form hypotheses and a/B tests to validate your hypotheses.