# Databricks

Databricks is a data and AI platform built on Apache Spark, often used for large-scale data analytics, data engineering, and machine learning. Lumi AI can connect to Databricks via direct internet-based access or through a Lumi AI Data Gateway for environments requiring a secured connection.

## Supported Methods

Means for Lumi to connect to the client to implement the services required

* [x] **Direct**&#x20;

  *This is the default option. This runs queries directly to the source, with no mediation. Make sure the source system allows for internet-based access (or whitelist Lumi AI; see rest of docs for specifics).*
* [x] **Gateway**\
  \&#xNAN;*This separate module is our tool to help connect to your data sources that are secured inside your network.*

## Supported Limits

The following are limits that can be configured for the system to moderate access and usage from users in Lumi AI:

* [ ] **Cost Limit**\
  \&#xNAN;*Before running a query, if the system supports it, the system-specific compute cost (or surrogate) estimate will be processed and compared to an organization-level/admin-set cost limit for systems of this type (if configured/set). If exceeded, the query will not run (and either the workflow will attempt an optimization or the user will be notified).*
* [ ] **Duration Limit**\
  \&#xNAN;*An alternative to cost, queries will be stopped the system supports a duration/timeout limit and one is set/configured at the organization level (across systems).*

## Available Parameters

These properties are the essential source system connection properties that all queries are directed towards.

*\* Required parameters*

{% hint style="info" %}
Note: The Gateway parameter is common to all systems (when supported) and is only available when gateway is the selected connection method.
{% endhint %}

<details>

<summary>Host*</summary>

The domain (or subdomain) of your Databricks workspace.

**Example**

* dbc-1234567.cloud.databricks.com

</details>

<details>

<summary>HTTP Path*</summary>

The HTTP Path for your Databricks SQL Warehouse (or all-purpose cluster). Typically found in the Databricks console under “Connection details” for your SQL Warehouse or cluster.

**Example**

* /sql/1.0/warehouses/abcd-1234-efgh-5678

</details>

<details>

<summary>Token*</summary>

A Databricks Personal Access Token (PAT) or other valid token with appropriate permissions for querying.

**Special Considerations**

* Must be kept secure; treat as a password.
* If creating with expiry, please track and be aware of necessary rotations.

</details>

<details>

<summary>Catalog*</summary>

The default catalog (schema container) you want to query in Databricks.

**Examples**

* main
* hive\_metastore

</details>

## System Permissions & Configuration

* **Databricks Workspace & Cluster Access**
  * The user/token must have permission to run queries on the specified Databricks SQL Warehouse or cluster.
  * If you are using Unity Catalog, ensure the token and associated user/group has the relevant permissions on the target catalogs, schemas, and tables.
* **Firewall & Network Configuration**
  * If using **Direct** connections, ensure your Databricks workspace can accept traffic from Lumi AI (either via an allow-list or by configuring public access).
  * If using **Gateway**, ensure the gateway host can connect to Databricks, and your cluster or workspace firewalls/security groups permit inbound traffic from the gateway.
* **Token Validity**
  * Databricks tokens typically expire after a set period. Ensure your token is current and refresh it before expiration to avoid failed connections.

## Special Notes

* **Case Sensitivity & Quoted Identifiers**\
  Databricks generally treats object names as case-insensitive unless quoted. Consistent naming and avoiding special characters can prevent confusion.
* **Cluster vs. SQL Warehouse**
  * If you plan to run queries on an all-purpose cluster, ensure it is configured to allow SQL connections and the `HTTP Path` is correct.
  * For production/reporting workloads, a Databricks SQL Warehouse (formerly “SQL Endpoint”) is often recommended for stable query performance.
* **Token vs. Username/Password**\
  In most Databricks setups, a personal access token replaces the typical username/password authentication. Make sure you store and handle this token securely.

## Common Issues

* **Invalid Host or HTTP Path**
  * Double-check you are using the correct workspace domain and the correct path from Databricks SQL Warehouses or cluster settings.
* **Expired Token**
  * If you encounter repeated authentication errors (`HTTP 401 Unauthorized`), verify that your Databricks token has not expired.
* **Insufficient Privileges**
  * Ensure that the token’s associated user/group has permission to run queries on the desired cluster or SQL Warehouse and to read the target tables.
* **Network Blocking or Firewall Issues**
  * If direct connections fail, confirm your firewall or security groups allow inbound traffic from Lumi AI or the Gateway to reach Databricks.
  * For private endpoints in Azure Databricks or AWS PrivateLink, additional VNet or VPC configurations may be necessary.
* **Misconfigured Gateway**
  * If using a gateway method, ensure the gateway is properly registered with Lumi AI, and that it can reach the Databricks domain on the required ports (443 for HTTPS).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.lumi-ai.com/product-features/source-system-integrations/databricks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
