What is Cohort Analysis and Why it Matters
A cohort is a group of users who share a common characteristic within a defined time window — most commonly, the month they first made a purchase or registered. Cohort analysis tracks what percentage of each cohort returns in subsequent months.
Without cohort analysis, aggregate retention metrics lie to you. An overall "monthly active users" chart can show growth even as your product is getting worse — because new user acquisition masks increasing churn from earlier cohorts. Cohort analysis reveals the truth: are users who joined 6 months ago still active? Is retention improving or deteriorating across cohorts over time?
💡 If your Month-1 retention is below 20% for a consumer app, growth will eventually plateau regardless of acquisition spend. Cohort analysis is how you discover this early.
Understanding the Retention Matrix
The output of cohort analysis is a retention matrix — a table where rows are cohorts (acquisition month), columns are time periods (months since acquisition), and each cell shows the percentage of the original cohort still active at that period. The diagonal of the matrix represents the same calendar month observed from different cohort perspectives.
| Cohort | Size | Month 0 | Month 1 | Month 2 | Month 3 | Month 4 | Month 5 |
|---|---|---|---|---|---|---|---|
| Jan 2025 | 1,240 | 100% | 41% | 28% | 22% | 18% | 16% |
| Feb 2025 | 980 | 100% | 46% | 31% | 25% | 20% | — |
| Mar 2025 | 1,560 | 100% | 52% | 35% | 27% | — | — |
| Apr 2025 | 1,180 | 100% | 58% | 38% | — | — | — |
| May 2025 | 1,420 | 100% | 61% | — | — | — | — |
Notice how Month-1 retention is improving across cohorts (41% → 61%). This tells us that whatever changes were made to the product or onboarding between January and May are working — more users return after the first month. This is exactly the insight that raw aggregate metrics would hide.
Data Preparation in Pandas
The starting point is a table of transactions or events with at least two columns: user_id and event_date. We need to derive two things for each user: their cohort month (first purchase month) and the order month of each subsequent purchase.
Building the Retention Matrix Step by Step
Visualising with a Heatmap
The retention matrix becomes dramatically more readable as a colour-coded heatmap. High retention values appear dark blue, low values appear light — making trends immediately visible to any stakeholder.
Calculating Churn Rate
Churn is the complement of retention: the percentage of users who did not return. Monthly churn for a cohort at period N is simply 100 - retention_at_N. But the more useful metric is incremental churn — the percentage lost between two consecutive periods.
How to Interpret and Present the Results
A retention matrix is only useful if you can translate the numbers into business decisions. Here's a framework for interpretation:
| Pattern you see | What it means | Action |
|---|---|---|
| Month-1 retention improving across cohorts | Onboarding or product improvements are working | Double down on what changed |
| Month-1 retention flat but Month-3 declining | Users start but lose habit over time | Improve re-engagement / habit loops |
| One cohort much worse than neighbours | Bad acquisition channel, bad campaign, or product bug that month | Investigate that specific period |
| Retention stabilises after Month 3 (~15%+) | You have a healthy retained core user base | Focus on growing this segment |
| Retention drops to near 0 by Month 2 | Product-market fit problem | Talk to churned users, redesign core loop |
Presenting to stakeholders
When presenting cohort retention to a non-technical audience, lead with the business implication, not the methodology. Instead of "Month-1 retention increased from 41% to 61% across January–May cohorts", say: "More than half of new customers now return for a second purchase within 30 days — up from 4 in 10 at the start of the year. This directly reduces our customer acquisition cost payback period."
🎯 The analyst's job isn't to produce a heatmap — it's to answer the question: "Is our product getting better or worse at keeping users?" Cohort analysis is the most reliable way to answer it.
