What is dbt and Why Does it Matter
dbt (data build tool) is an open-source command-line tool that lets analysts write data transformations as SQL SELECT statements and handles all the surrounding complexity: dependency management, testing, documentation, and deployment. Think of it as "version control + testing + documentation" layered on top of SQL you already know.
Before dbt, a typical analyst workflow involved writing transformation SQL directly in the data warehouse, storing queries in disconnected files or folders, manually running them in the right order, and hoping that nothing upstream broke silently. dbt solves all of these problems by treating SQL transformations as code — with all the engineering best practices that entails.
By 2026, dbt has become the dominant transformation layer in the modern data stack. If you work with BigQuery, Snowflake, Redshift, DuckDB or PostgreSQL, dbt is likely already in use at companies you want to work for.
💡 dbt doesn't replace SQL — it makes your SQL more organised, testable, and maintainable. You still write plain SELECT statements.
Core Concepts: Models, Refs, Sources
dbt has a small number of core concepts that you need to understand before writing any code:
| Concept | What it is | Analogy |
|---|---|---|
| Model | A single .sql file with a SELECT statement | One transformation step |
| ref() | Function to reference another model | Import statement |
| source() | Reference to raw data tables | Input data declaration |
| Test | Assertion about your data (not null, unique, etc.) | Unit test |
| Schema.yml | YAML file with model metadata and tests | Configuration + docs |
Writing Your First dbt Model
A dbt model is simply a .sql file placed in the models/ directory of your project. The filename becomes the table name in your data warehouse. dbt wraps your SELECT in a CREATE TABLE AS or CREATE VIEW AS statement automatically.
Notice the {{ ref('stg_orders') }} syntax. This is dbt's Jinja templating — it tells dbt that fct_monthly_revenue depends on stg_orders. dbt builds a dependency graph from these references and always runs models in the correct order.
Data Testing with dbt
One of the most powerful features of dbt is built-in data testing. You define tests in a YAML file alongside your models, and dbt runs them to verify your data meets expectations.
Run all tests with a single command: dbt test. If any test fails, dbt reports exactly which rows violated the constraint. This is invaluable for catching silent data quality issues before they reach dashboards or reports.
Auto-Generated Documentation
dbt generates a full documentation website from your YAML descriptions and model code. Run dbt docs generate and then dbt docs serve to launch it locally. The docs include a description of every model and column, the SQL code, and the dependency lineage graph.
For analysts working in a team, this solves the perennial problem of "what does this table actually contain?" — the answer is always one click away in the dbt docs.
Data Lineage
Because dbt knows all the dependencies between models (via ref()), it can draw a complete lineage graph: from raw source tables through staging models to final mart tables used in dashboards. This makes it immediately obvious what breaks when an upstream table changes.
📊 Typical dbt project structure: sources (raw tables) → staging (cleaning & typing) → intermediate (joins & enrichment) → marts (business-ready aggregations for dashboards)
When dbt Is (and Isn't) the Right Tool
| Use dbt when... | Don't use dbt when... |
|---|---|
| You have repeatable SQL transformations that run on a schedule | You're doing one-off ad-hoc analysis |
| Multiple people work on the same data models | You're working solo with simple queries |
| You need data quality guarantees before dashboards | Your raw data is already clean enough |
| You want auto-documentation and lineage tracking | Speed of setup matters more than structure |
| Your company uses a cloud DWH (BigQuery, Snowflake, etc.) | You're working with local CSV files only |
Getting started is straightforward: install dbt Core with pip install dbt-bigquery (or your warehouse adapter), run dbt init my_project, connect to your warehouse, and write your first model. The official dbt documentation and the free dbt Learn courses are excellent resources for going deeper.
🎯 In 2026, knowing dbt separates analysts who work with raw SQL from analysts who build production-grade data pipelines. It's one of the highest-value skills to add to your toolkit.
