Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databricks Sample - Terraform IaC for Azure Databricks and Asset Bundle Deployment via CI/CD #911

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

DilmurodMak
Copy link

Pull Request Overview

This PR updates and enhances the Databricks deployment process using Terraform and Asset Bundle Deployment via GitHub Actions. It simplifies deployment for multi environment deployment.

Key Highlights

  • Folder Structure:
    • Sample Code is Organized under single_tech_samples/databricks/databricks_terraform
    • Includes directories for:
      • Infra: Terraform code for Azure resource deployment.
      • tests: Scripts and workflows for Databricks testing.
      • utils: Helper scripts like generate-databricks-workflows.sh.
      • workflows: Pre-generated Databricks workflows to run tests in Databricks workspace.
  • CI/CD Pipelines:
    • Linting Pipeline: Validates Python notebooks and workflows.
    • Sandbox Deployment: Validates and deploys Databricks assets, executes test workflows.
    • Development Deployment: Deploys assets to the development environment following successful sandbox deployment and runs the same tests in development environment

Testing Steps

The Sample code covers the deployment from sandbox to development environment.

  1. Create a new branch and submit a PR to main.
  2. Verify the following:
    • Linting Pipeline runs and validates code.
    • Sandbox Deployment Pipeline validates and tests workflows during the PR.
  3. Merge the PR to main.
  4. Observe:
    • Sandbox assets are deployed, and test workflows are executed successfully.
    • Development deployment triggers upon successful sandbox completion.

@ydaponte
Copy link
Collaborator

ydaponte commented Dec 4, 2024

@DilmurodMak - one of the validations is failing, can you take a look?

@ydaponte ydaponte added the single-tech: azure-databricks Related to Azure Databricks single-tech sample label Dec 4, 2024
@DilmurodMak
Copy link
Author

DilmurodMak commented Dec 4, 2024

@DilmurodMak - one of the validations is failing, can you take a look?

@ydaponte , The pipeline are templates, it requires databricks workspaces exist and its urls are set in databricks.yml file. There for its failing. We can put it as template reference in the single solution doc instead of running in PRs if we do not want to trigger it

Copy link
Collaborator

@ydaponte ydaponte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some comments that need to be addressed before we can merge into main. There are some best practices and alignment with the overall repo that will need to be done as for example the creation of a devcontainer for the sample. Thanks for the great work so far!


This repository contains databricks deployment using terraform and databricks asset bundle deployment using Github Actions.

## Prerequisites
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Prerequisites

Removing the Prerequisites from here as you can consolidate later in the ReadMe. Also in the second prerequisite you jump into another ReadMe - I think the flow is not perfect.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First READ.me is for the root - It is intended to describe overall repo,
Second READ.me is in Infra folder - its intended to give instructions how to deploy all the resources using terraform.

I will make it more clear


## Prerequisites

- Azure Subscription
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Azure Subscription

## Prerequisites

- Azure Subscription
- Deploy Azure Resources using Terraform Code. See [./Infra/README.MD](./Infra/README.md)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Deploy Azure Resources using Terraform Code. See [./Infra/README.MD](./Infra/README.md)

This is a repetition from the later pre-requisites: Ensure Sandbox and Development Resources are deployed in Azure using Terraform code in ./Infra folder. I'm suggesting removing it from here.


### Pre-requisites
- Clone the repository
- Install Terraform CLI if not installed already [Terraform Installation](https://learn.hashicorp.com/tutorials/terraform/install-cli)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add this pre-requirements to be installed when launching the devcontainer

cd Infra/deployment
```

2. Login to Azure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Login to Azure should be probably the first step, second step should be the az account set and third step should be the Change directory to ...

- **`adb-workspace`** - Deploys Databricks workspace
- **`metastore-and-users`** - Creates Databricks Connector, Creates Storage Account, Give storage access rights to connector, Creates Metastore / Assigns Workspace to Metastore, and Finally Retrieves alls users, groups, and service principals from Azure AD.
- **`adb-unity-catalog`** - Gives databricks access rights to the connector, Creates containers in the storage account, and creates external locations for the containers. Creates unity catalog and grants permissions user groups. Finally, creates **`bronze` `silver` `gold`** schemas under the catalog and gives the required permissions to the user groups.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the script fails with an error - lack of permissions or something else - is the script idempotent? Meaning, can we re-run and the script will continue where it left of? Can we make a note about that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, script can be run where its left off. It references the state files of each terraform model when running, and based on the state it can continue where its left off.

I will make a note of that

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a note on that in READ.me of Infra, that the end of the doc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the png, can you please commit the drawio.svg version file of this diagram? We are starting to create a standard in the repo for that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These was originally images, I will try to recreate in draw.io if necessary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an editable version of this diagram that can be commited instead?

@DilmurodMak DilmurodMak requested a review from ydaponte December 16, 2024 18:22

## Samples

- [IaC deployment of Azure Databricks](./databricks_ci_cd/README.md) - This sample demonstrates how to deploy an Azure Databricks environment using ARM templates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [linkspector] reported by reviewdog 🐶
Cannot reach ./databricks_ci_cd/README.md. Status: 404 Cannot find: ./databricks_ci_cd/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
single-tech: azure-databricks Related to Azure Databricks single-tech sample
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants