How to use dbt in python environment

Dbt is usefull library for dwh to create a datamart or datamarts. You can find all details in dbt official pages.

I used a few times, so i can clarify for you how you can create a dbt models and dbt configs in your own project, you can do that like below step by steps;

1 – Create a profiles.yml file for DBT Profile. Specify your db connection information etc.
2 – Create a data_model folder like project_dir
3 – Create a .yml file for main project .yml file and you will call it like project_file
4 – Create your own dbt_runner file like dbt_runner.py and set it your execution configs
5 – Create a model folder, you will put your models in that folder
6 – Create a schema or model for yourself and put into that folder a xxxx.schema.yml file
6.1 – Put some table value like below;
bietl_patch:
constraints:
unique:
– somthng_id
not_null:
– somthng_id
– xxx_id

In the end, you will have like below folder and schema;
# DBT Profile. Specify your DB connection information etc.
profiles.yml on the root directory
bietl_data_model folder
bietl_datamarts.yml file
dbt_runner.py python file

bietl_data_model
> models
> specification of your models bietl
> bietl_datamarts.schema.yml
> sql files for using.sql

I’m executing that dbt in airflow das but I didn’t mention it, maybe in next post.

Automate update of data marts to BigQuery

Hey Everyone,

Some reason I’m using AWS – RS and S3 bucket, so for the implementation is 2x daily means batch ingestion of data S3 to Cloud Storage.

1 – Create a dag in Airflow

2 – Execute query Unload in Redshift to S3 with no header

3 – Open Google Cloud interface and use Cloud Storage Transfer Service

Cloud Storage Transfer Service supports one-time transfers or recurring transfers. It provides advanced filters based on file creation dates, filename filters, and the times of day you prefer to import data. It also supports the deletion of the source data after it’s been copied.

3.1 – Create a GCS bucket for storing data in Cloud Storage

3.2 – Select source: GCS you will provide s3 bucket name, access key, a secret key which user authorized s3 bucket to list object, read and delete buckets.

3.3 – Select destination: Provide GCS bucket name and synchronization options (overwrite, delete the destination, delete the source)

3.4 – Set scheduled time and set a description for the task

4 – Now time to Create a table for Querying, open Google Big Query interface

4.1 – Create a DataSet in Big Query interfaces

4.2 – Create a table in Big Query like below steps

4.2.1 – Create a table from Google Cloud Storage

4.2.2 – Select file from GCS bucket which was created by you

4.2.3 – Select file format

4.2.4 – Specify Destination table information with Table type must be external (for daily append your data from s3 to GCS)

4.2.5 – Don’t specify schema we will provide it later.

Now you have a table which providing data from s3 to GCS bucket, but you have to specify headers and column type then you can select your data in BQ.

When you select your table on BQ like below command;

SELECT * FROM `dataset.bi_1234__test.table ` LIMIT 10;

On the below screen, you can see the schema menu than you can add fields with types of your column.