How to set up data transformations with dbt
Quick Answer: dbt (data build tool) transforms raw data in a warehouse by running SQL models. Initialize a project with `dbt init`, configure the warehouse connection in `profiles.yml`, write SQL model files, run `dbt build` to execute transformations, and test with `dbt test`.
How to Set Up Data Transformations with dbt
dbt (data build tool) is a transformation framework that enables analytics engineers to write SQL SELECT statements that dbt compiles and executes against a data warehouse. As of April 2026, dbt Core is open-source (Apache 2.0 license), and dbt Cloud starts at $100/month for the Team plan with job scheduling, CI, and a web IDE.
Step 1: Install dbt Core
Install dbt Core via pip (Python package manager):
pip install dbt-core dbt-snowflake # or dbt-bigquery, dbt-redshift, dbt-postgres
The adapter package (dbt-snowflake, dbt-bigquery, etc.) matches the target warehouse. Alternatively, use dbt Cloud, which provides a web-based IDE and does not require local installation.
Step 2: Initialize a Project
Run dbt init my_project to scaffold a new dbt project. This creates:
dbt_project.yml— Project configuration (name, version, model paths)models/— Directory for SQL model filestests/— Directory for custom testsmacros/— Directory for reusable SQL macrosseeds/— Directory for CSV seed data
Step 3: Configure profiles.yml
Edit ~/.dbt/profiles.yml with the warehouse connection details:
my_project:
target: dev
outputs:
dev:
type: snowflake
account: xy12345.us-east-1
user: dbt_user
password: "{{ env_var('DBT_PASSWORD') }}"
database: analytics
warehouse: transforming
schema: dbt_dev
threads: 4
Use environment variables for credentials instead of hardcoding passwords.
Step 4: Write SQL Models
Create SQL files in the models/ directory. Each file defines one table or view. Example:
-- models/staging/stg_orders.sql
SELECT
id AS order_id,
customer_id,
order_date,
status,
amount_cents / 100.0 AS amount_dollars
FROM {{ source('raw', 'orders') }}
WHERE status != 'cancelled'
dbt models reference raw tables using {{ source() }} and other models using {{ ref() }}. The ref() function creates a dependency graph that dbt uses to determine execution order.
Step 5: Define Sources and Schema Tests
Create models/staging/schema.yml to document sources and add tests:
sources:
- name: raw
tables:
- name: orders
columns:
- name: id
tests:
- unique
- not_null
Step 6: Run dbt build
Execute dbt build to compile and run all models and tests:
- dbt resolves the dependency graph (DAG)
- Models execute in dependency order (staging before marts)
- Tests run after their associated models
- Output shows pass/fail status for each model and test
Use dbt run to execute models only (skip tests) or dbt test to run tests only.
Step 7: Promote to Production
Configure a production target in profiles.yml with a separate schema (for example, dbt_prod). Use dbt Cloud or a CI/CD pipeline (GitHub Actions, GitLab CI) to run dbt build --target prod on a schedule.
Organizing Models
The standard dbt project structure uses three layers:
- Staging (
models/staging/): One-to-one with source tables; renaming, type casting, basic filtering - Intermediate (
models/intermediate/): Business logic, joins, aggregations - Marts (
models/marts/): Final tables consumed by BI tools and dashboards
Related Questions
Related Tools
Airbyte
Open-source data integration platform for ELT pipelines with 400+ connectors
ETL & Data PipelinesAlteryx
Visual data analytics and automation platform for data preparation, blending, and advanced analytics without coding.
ETL & Data PipelinesApache Airflow
Programmatic authoring, scheduling, and monitoring of data workflows
ETL & Data PipelinesApify
Web scraping and browser automation platform with 2,000+ pre-built scrapers
ETL & Data PipelinesRelated Rankings
Best Automation Tools for Data Teams in 2026
A ranked list of the best automation and data pipeline tools for data teams in 2026. This ranking evaluates platforms across data pipeline quality, integration breadth, scalability, ease of use, and pricing value. Tools are assessed based on their ability to handle ETL/ELT workflows, data transformation, orchestration, and integration tasks that data engineers and analysts rely on daily. The ranking includes both dedicated data tools (Apache Airflow, Fivetran, Prefect) and general-purpose automation platforms (n8n, Make) that have developed strong data pipeline capabilities. Each tool is scored on a 10-point scale across five weighted criteria.
Best ETL & Data Pipeline Tools 2026
Our ranking of the top ETL and data pipeline tools for building reliable data workflows and transformations in 2026.
Dive Deeper
How to Set Up an Automated Data Pipeline: Fivetran to dbt to Snowflake
An end-to-end tutorial for building a modern ELT data pipeline using Fivetran for extraction/loading, Snowflake as the warehouse, and dbt for SQL-based transformations. Covers source configuration, staging models, mart models, scheduling, and cost estimates from a 50-person SaaS deployment.
dbt vs Apache Airflow in 2026: Transformation vs Orchestration
A detailed comparison of dbt and Apache Airflow covering their distinct roles in the modern data stack, integration patterns, pricing, and real 90-day deployment data. Explains when to use each tool alone and when to use both together.
Airbyte vs Fivetran in 2026: Open-Source vs Managed ELT
A data-driven comparison of Airbyte and Fivetran covering architecture, connector ecosystems, pricing at scale, reliability, compliance certifications, and real 60-day parallel deployment results. Covers self-hosted, cloud, and enterprise options for both platforms.