Training project to make Databricks pipelines.
CI (GitHub Actions) will run checks, tests and deploy notebooks and jobs to the Databricks server.
Note
This project is still in WIP
Here are approximate Databricks Jobs dependencies:
flowchart LR
classDef raw fill:#949494
classDef bronze fill:#CD7F32
classDef silver fill:#e0e0e0
r1(Raw Job 1):::raw
r2(Raw Job 2):::raw
b1(Bronze Job 1):::bronze
b2(Bronze Job 2):::bronze
s1(Silver Job):::silver
r1 --> b1
r2 --> b2
b1 & b2 --> s1
- Setup Azure Databricks and create token for your account.
- Prepare
.envfile from an.env_template:cp .env_template .envand fill your secrets.
GitHub Actions CI/CD flow defined under .github/workflows:
---
title: CI flow
---
flowchart LR
subgraph pr[Pull request flow]
direction TB
A1[Install Python and dependencies] -->
B1[Static checks] -->
C1[TODO: Unit tests] -->
D1[TODO: Upload test results]
end
subgraph deploy[Merge to master flow]
direction TB
A2[Upload notebooks to Databricks] -->
B2[Build and upload Python lib to Databricks] -->
C2[Create Databricks Jobs]
end
pr --> deploy