Set Up Data Transformations with dbt

How to set up data transformations with dbt

Quick Answer: dbt (data build tool) transforms raw data in a warehouse by running SQL models. Initialize a project with `dbt init`, configure the warehouse connection in `profiles.yml`, write SQL model files, run `dbt build` to execute transformations, and test with `dbt test`.

How to Set Up Data Transformations with dbt

dbt (data build tool) is a transformation framework that enables analytics engineers to write SQL SELECT statements that dbt compiles and executes against a data warehouse. As of April 2026, dbt Core is open-source (Apache 2.0 license), and dbt Cloud starts at $100/month for the Team plan with job scheduling, CI, and a web IDE.

Step 1: Install dbt Core

Install dbt Core via pip (Python package manager):

pip install dbt-core dbt-snowflake  # or dbt-bigquery, dbt-redshift, dbt-postgres

The adapter package (dbt-snowflake, dbt-bigquery, etc.) matches the target warehouse. Alternatively, use dbt Cloud, which provides a web-based IDE and does not require local installation.

Step 2: Initialize a Project

Run dbt init my_project to scaffold a new dbt project. This creates:

dbt_project.yml — Project configuration (name, version, model paths)
models/ — Directory for SQL model files
tests/ — Directory for custom tests
macros/ — Directory for reusable SQL macros
seeds/ — Directory for CSV seed data

Step 3: Configure profiles.yml

Edit ~/.dbt/profiles.yml with the warehouse connection details:

my_project:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: xy12345.us-east-1
      user: dbt_user
      password: "{{ env_var('DBT_PASSWORD') }}"
      database: analytics
      warehouse: transforming
      schema: dbt_dev
      threads: 4

Use environment variables for credentials instead of hardcoding passwords.

Step 4: Write SQL Models

Create SQL files in the models/ directory. Each file defines one table or view. Example:

-- models/staging/stg_orders.sql
SELECT
    id AS order_id,
    customer_id,
    order_date,
    status,
    amount_cents / 100.0 AS amount_dollars
FROM {{ source('raw', 'orders') }}
WHERE status != 'cancelled'

dbt models reference raw tables using {{ source() }} and other models using {{ ref() }}. The ref() function creates a dependency graph that dbt uses to determine execution order.

Step 5: Define Sources and Schema Tests

Create models/staging/schema.yml to document sources and add tests:

sources:
  - name: raw
    tables:
      - name: orders
        columns:
          - name: id
            tests:
              - unique
              - not_null

Step 6: Run dbt build

Execute dbt build to compile and run all models and tests:

dbt resolves the dependency graph (DAG)
Models execute in dependency order (staging before marts)
Tests run after their associated models
Output shows pass/fail status for each model and test

Use dbt run to execute models only (skip tests) or dbt test to run tests only.

Step 7: Promote to Production

Configure a production target in profiles.yml with a separate schema (for example, dbt_prod). Use dbt Cloud or a CI/CD pipeline (GitHub Actions, GitLab CI) to run dbt build --target prod on a schedule.

Organizing Models

The standard dbt project structure uses three layers:

Staging (models/staging/): One-to-one with source tables; renaming, type casting, basic filtering
Intermediate (models/intermediate/): Business logic, joins, aggregations
Marts (models/marts/): Final tables consumed by BI tools and dashboards