Databricks

Databricks is a data and AI platform built on Apache Spark, often used for large-scale data analytics, data engineering, and machine learning. Lumi AI can connect to Databricks via direct internet-based access or through a Lumi AI Data Gateway for environments requiring a secured connection.

Supported Methods

Means for Lumi to connect to the client to implement the services required

Supported Limits

The following are limits that can be configured for the system to moderate access and usage from users in Lumi AI:

Available Parameters

These properties are the essential source system connection properties that all queries are directed towards.

* Required parameters

Note: The Gateway parameter is common to all systems (when supported) and is only available when gateway is the selected connection method.

Host*

The domain (or subdomain) of your Databricks workspace.

Example

  • dbc-1234567.cloud.databricks.com

HTTP Path*

The HTTP Path for your Databricks SQL Warehouse (or all-purpose cluster). Typically found in the Databricks console under “Connection details” for your SQL Warehouse or cluster.

Example

  • /sql/1.0/warehouses/abcd-1234-efgh-5678

Token*

A Databricks Personal Access Token (PAT) or other valid token with appropriate permissions for querying.

Special Considerations

  • Must be kept secure; treat as a password.

  • If creating with expiry, please track and be aware of necessary rotations.

Catalog*

The default catalog (schema container) you want to query in Databricks.

Examples

  • main

  • hive_metastore

System Permissions & Configuration

  • Databricks Workspace & Cluster Access

    • The user/token must have permission to run queries on the specified Databricks SQL Warehouse or cluster.

    • If you are using Unity Catalog, ensure the token and associated user/group has the relevant permissions on the target catalogs, schemas, and tables.

  • Firewall & Network Configuration

    • If using Direct connections, ensure your Databricks workspace can accept traffic from Lumi AI (either via an allow-list or by configuring public access).

    • If using Gateway, ensure the gateway host can connect to Databricks, and your cluster or workspace firewalls/security groups permit inbound traffic from the gateway.

  • Token Validity

    • Databricks tokens typically expire after a set period. Ensure your token is current and refresh it before expiration to avoid failed connections.

Special Notes

  • Case Sensitivity & Quoted Identifiers Databricks generally treats object names as case-insensitive unless quoted. Consistent naming and avoiding special characters can prevent confusion.

  • Cluster vs. SQL Warehouse

    • If you plan to run queries on an all-purpose cluster, ensure it is configured to allow SQL connections and the HTTP Path is correct.

    • For production/reporting workloads, a Databricks SQL Warehouse (formerly “SQL Endpoint”) is often recommended for stable query performance.

  • Token vs. Username/Password In most Databricks setups, a personal access token replaces the typical username/password authentication. Make sure you store and handle this token securely.

Common Issues

  • Invalid Host or HTTP Path

    • Double-check you are using the correct workspace domain and the correct path from Databricks SQL Warehouses or cluster settings.

  • Expired Token

    • If you encounter repeated authentication errors (HTTP 401 Unauthorized), verify that your Databricks token has not expired.

  • Insufficient Privileges

    • Ensure that the token’s associated user/group has permission to run queries on the desired cluster or SQL Warehouse and to read the target tables.

  • Network Blocking or Firewall Issues

    • If direct connections fail, confirm your firewall or security groups allow inbound traffic from Lumi AI or the Gateway to reach Databricks.

    • For private endpoints in Azure Databricks or AWS PrivateLink, additional VNet or VPC configurations may be necessary.

  • Misconfigured Gateway

    • If using a gateway method, ensure the gateway is properly registered with Lumi AI, and that it can reach the Databricks domain on the required ports (443 for HTTPS).

Last updated

Was this helpful?