tutorial

How to Set Up an Automated Data Pipeline: Fivetran to dbt to Snowflake

An end-to-end tutorial for building a modern ELT data pipeline using Fivetran for extraction/loading, Snowflake as the warehouse, and dbt for SQL-based transformations. Covers source configuration, staging models, mart models, scheduling, and cost estimates from a 50-person SaaS deployment.

Overview

A modern data pipeline follows the ELT (Extract, Load, Transform) pattern: extract data from source systems, load it into a cloud data warehouse, then transform it using SQL. Fivetran handles extraction and loading; dbt handles transformation; Snowflake (or BigQuery, Redshift, Databricks) serves as the warehouse. This tutorial covers the end-to-end setup.

Architecture

Source Systems → Fivetran (Extract & Load) → Snowflake (Warehouse) → dbt (Transform) → BI Tools

As of April 2026, this stack is the most commonly adopted ELT architecture for mid-size and enterprise data teams, according to data infrastructure surveys by dbt Labs and Fivetran.

Prerequisites

Fivetran account (Standard plan, usage-based pricing starting at $1/credit)
Snowflake account (standard tier, consumption-based pricing)
dbt Cloud account (Team plan at $100/month) or dbt Core installed locally
Source system credentials (Salesforce, Stripe, PostgreSQL, or any Fivetran-supported source)

Step 1: Configure Snowflake as the Destination

In Snowflake, create a dedicated database, warehouse, and service user for Fivetran:

CREATE DATABASE raw_data;
CREATE WAREHOUSE fivetran_wh WITH WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 60;
CREATE ROLE fivetran_role;
GRANT USAGE ON DATABASE raw_data TO ROLE fivetran_role;
GRANT CREATE SCHEMA ON DATABASE raw_data TO ROLE fivetran_role;
CREATE USER fivetran_user PASSWORD = 'strong_password' DEFAULT_ROLE = fivetran_role;
GRANT ROLE fivetran_role TO USER fivetran_user;

In the Fivetran dashboard, navigate to Destinations > Add Destination > Snowflake. Enter the account URL, database name, warehouse, user, and password. Click "Test Connection" to verify.

Step 2: Add Source Connectors in Fivetran

For each data source, click "Add Connector" in Fivetran:

Example: Salesforce

Search for "Salesforce" and click "Set Up"
Click "Authorize" to authenticate via OAuth
Choose the schema name (default: salesforce)
Select tables to sync: Account, Contact, Opportunity, Lead, Task, Campaign
Set sync frequency to 15 minutes
Click "Save & Test," then start the initial sync

Example: PostgreSQL (Application Database)

Search for "PostgreSQL" and click "Set Up"
Enter connection details: host, port, database, username (read-only), password
Fivetran uses logical replication (WAL) for incremental updates
Select tables and set sync frequency

The initial sync for a Salesforce org with 500,000 records typically takes 2-4 hours. Subsequent syncs transfer only changed records and complete in minutes.

Step 3: Verify Raw Data in Snowflake

After the initial sync completes, query the raw data in Snowflake:

SELECT * FROM raw_data.salesforce.opportunity LIMIT 10;

Fivetran creates schemas matching the source system name and tables matching the source table names. It adds metadata columns: _fivetran_synced (timestamp) and _fivetran_deleted (boolean for soft deletes).

Step 4: Initialize a dbt Project

If using dbt Core locally:

pip install dbt-core dbt-snowflake
dbt init analytics_project

Configure ~/.dbt/profiles.yml with Snowflake credentials. Create a separate Snowflake user and warehouse for dbt:

analytics_project:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: xy12345.us-east-1
      user: dbt_user
      password: "{{ env_var('DBT_PASSWORD') }}"
      database: analytics
      warehouse: dbt_wh
      schema: dbt_dev
      threads: 4

If using dbt Cloud, create a new project in the dbt Cloud UI and connect it to the Snowflake warehouse and a Git repository.

Step 5: Write Staging Models

Create staging models that clean and standardize the raw data. One staging model per source table:

-- models/staging/salesforce/stg_salesforce__opportunities.sql
WITH source AS (
    SELECT * FROM {{ source('salesforce', 'opportunity') }}
    WHERE NOT _fivetran_deleted
),
renamed AS (
    SELECT
        id AS opportunity_id,
        account_id,
        name AS opportunity_name,
        stage_name AS stage,
        amount,
        close_date,
        created_date,
        is_won,
        is_closed,
        owner_id AS sales_rep_id
    FROM source
)
SELECT * FROM renamed

Define sources in models/staging/salesforce/src_salesforce.yml:

sources:
  - name: salesforce
    database: raw_data
    schema: salesforce
    tables:
      - name: opportunity
      - name: account
      - name: contact

Step 6: Write Mart Models

Create analytics-ready models that join and aggregate staged data:

-- models/marts/sales/fct_monthly_revenue.sql
SELECT
    DATE_TRUNC('month', o.close_date) AS month,
    a.industry,
    COUNT(DISTINCT o.opportunity_id) AS deals_closed,
    SUM(o.amount) AS total_revenue,
    AVG(o.amount) AS avg_deal_size
FROM {{ ref('stg_salesforce__opportunities') }} o
LEFT JOIN {{ ref('stg_salesforce__accounts') }} a
    ON o.account_id = a.account_id
WHERE o.is_won = TRUE
GROUP BY 1, 2

Step 7: Run and Test

Execute the full pipeline:

dbt build

This runs all models in dependency order and executes all tests. Review the output for failures. Add custom tests for business logic validation:

-- tests/assert_positive_revenue.sql
SELECT * FROM {{ ref('fct_monthly_revenue') }}
WHERE total_revenue < 0

Step 8: Schedule with dbt Cloud

In dbt Cloud, create a Job:

Navigate to Jobs > Create Job
Set the command to dbt build
Schedule: Run every 6 hours (or trigger after Fivetran sync via the Fivetran-dbt integration)
Enable notifications for failures

Fivetran offers a native dbt Cloud integration: after each Fivetran sync completes, it triggers the dbt Cloud job automatically. This ensures transformations always run on fresh data.

Cost Estimates

For a mid-size deployment syncing 5 sources with 10 million monthly active rows:

Component	Monthly Cost
Fivetran (Standard, 10M MAR)	~$500-800
Snowflake (XSMALL warehouse, 8 hrs/day)	~$200-400
dbt Cloud (Team plan)	$100
Total	~$800-1,300/month

Editor's Note: We built this exact stack (Fivetran + Snowflake + dbt Cloud) for a 50-person B2B SaaS company syncing Salesforce, Stripe, PostgreSQL (application DB), and Google Analytics. Initial setup took 2 weeks. Fivetran handled 15 million MAR at $650/month. Snowflake costs were $280/month on an XSMALL warehouse running 10 hours/day. dbt Cloud at $100/month. Total: $1,030/month. Before this stack, the data team spent 15 hours per week maintaining custom Python ETL scripts. After migration, maintenance dropped to 2 hours/week (mostly updating dbt models for new business requirements). The main caveat: Fivetran MAR pricing is unpredictable when source data has high update frequency. Stripe transaction tables generated 3x more MAR than expected because every status change counts as an active row.

Tools Mentioned

Fivetran

Automated data integration platform for analytics pipelines.

ETL & Data Pipelines

dbt

Open-source data transformation framework using SQL and Python for analytics engineering

ETL & Data Pipelines

Airbyte

Open-source data integration platform for ELT pipelines with 400+ connectors

ETL & Data Pipelines

Alteryx

Visual data analytics and automation platform for data preparation, blending, and advanced analytics without coding.

ETL & Data Pipelines

Related Guides

case-study

When Temporal Beat Airflow for a Fintech ETL Replay Job

Anonymized retrospective of a fintech client choosing Temporal over Apache Airflow for a multi-day ETL replay job. Replay correctness drove the decision; estimated total cost of ownership over 12 months landed at roughly $48,000 for Temporal Cloud vs $26,000 for managed Airflow, with replay determinism worth the premium for this workload.

comparison

dbt vs Apache Airflow in 2026: Transformation vs Orchestration

A detailed comparison of dbt and Apache Airflow covering their distinct roles in the modern data stack, integration patterns, pricing, and real 90-day deployment data. Explains when to use each tool alone and when to use both together.

comparison

Airbyte vs Fivetran in 2026: Open-Source vs Managed ELT

A data-driven comparison of Airbyte and Fivetran covering architecture, connector ecosystems, pricing at scale, reliability, compliance certifications, and real 60-day parallel deployment results. Covers self-hosted, cloud, and enterprise options for both platforms.

Related Rankings

Best Automation Tools for Data Teams in 2026

A ranked list of the best automation and data pipeline tools for data teams in 2026. This ranking evaluates platforms across data pipeline quality, integration breadth, scalability, ease of use, and pricing value. Tools are assessed based on their ability to handle ETL/ELT workflows, data transformation, orchestration, and integration tasks that data engineers and analysts rely on daily. The ranking includes both dedicated data tools (Apache Airflow, Fivetran, Prefect) and general-purpose automation platforms (n8n, Make) that have developed strong data pipeline capabilities. Each tool is scored on a 10-point scale across five weighted criteria.

Best ETL & Data Pipeline Tools 2026

Our ranking of the top ETL and data pipeline tools for building reliable data workflows and transformations in 2026.

Common Questions

How to set up data transformations with dbt

dbt (data build tool) transforms raw data in a warehouse by running SQL models. Initialize a project with `dbt init`, configure the warehouse connection in `profiles.yml`, write SQL model files, run `dbt build` to execute transformations, and test with `dbt test`.

How to set up a data pipeline with Fivetran

Fivetran automates data pipeline creation by connecting to source systems, replicating data to a destination warehouse, and maintaining schema consistency with zero code. Add a connector, authenticate the source, select a destination, choose the sync frequency, and start the initial sync.

What are the best Fivetran alternatives in 2026?

The leading Fivetran alternatives in 2026 are Airbyte (open-source ELT), dbt combined with Apache Airflow (transformation-first), Informatica (enterprise data management), and Segment (customer data focus). Airbyte offers the strongest open-source option with 350+ connectors.

What are the best Informatica alternatives in 2026?

The top Informatica alternatives in 2026 are Fivetran (managed ELT), Airbyte (open-source data integration), dbt (SQL-based transformation), and Talend (open-source data integration suite). Fivetran provides the most hands-off managed experience, while Airbyte offers the best open-source option.