Top SQL Queries Every SAP Databricks Admins Should Keep in your Back Pocket

Estimated read time 14 min read
All the SQL queries provided are intended for use with SAP Databricks System Tables. Central to this is the fact that System Tables are centrally stored and governed by Unity Catalog.
 
A solid SQL toolkit can make an SAP Databricks admin unstoppable. Here are the queries you’ll rely on the most—fast, essential, and built for real‑world scenarios.
A reliable set of SQL queries that help you diagnose issues, validate access, and keep your environment running smoothly. This blog highlights the must‑know queries that every admin should bookmark or pin it.

Important Note :

Before executing the SQL queries below, please ensure that you have the required permissions and access to the relevant tables. If not, kindly contact your SAP Databricks Administrator in your organization to request the necessary privileges.

Resource Monitoring – DB SQL Warehouses that are actively running and for how long

This query helps understanding which warehouses are currently active and their running time in hours.

SELECT
we.workspace_id,
we.warehouse_id,
we.event_time,
TIMESTAMPDIFF(MINUTE, we.event_time, CURRENT_TIMESTAMP()) / 60.0 AS running_hours,
we.cluster_count
FROM
system.compute.warehouse_events we
WHERE
we.event_type = ‘RUNNING’
AND NOT EXISTS (
SELECT 1
FROM compute.warehouse_events we2
WHERE we2.warehouse_id = we.warehouse_id
AND we2.event_time > we.event_time
)

Resource Monitoring – Warehouses that are upscaled longer than expected

This query identifies warehouses that have scaled up (increased in cluster count) and have remained in that state longer than usual.

SELECT
we.workspace_id,
we.warehouse_id,
we.event_time,
TIMESTAMPDIFF(MINUTE, we.event_time, CURRENT_TIMESTAMP()) / 60.0 AS upscaled_hours,
we.cluster_count
FROM
system.compute.warehouse_events we
WHERE
we.event_type = ‘SCALED_UP’
AND we.cluster_count >= 2
AND NOT EXISTS (
SELECT 1
FROM compute.warehouse_events we2
WHERE we2.warehouse_id = we.warehouse_id
AND (
(we2.event_type = ‘SCALED_DOWN’) OR
(we2.event_type = ‘SCALED_UP’ AND we2.cluster_count < 2)
)
AND we2.event_time > we.event_time
)

Join the list_prices table with the usage table

SELECT
SUM(usage.usage_quantity * list_prices.pricing.effective_list.default)
as `Total Dollar Cost`
FROM system.billing.usage
JOIN system.billing.list_prices ON list_prices.sku_name = usage.sku_name
WHERE identity_metadata.run_as = {{‘sample identity’}}
AND usage.usage_end_time >= list_prices.price_start_time
AND (list_prices.price_end_time IS NULL OR usage.usage_end_time < list_prices.price_end_time)
AND usage.usage_date BETWEEN “2025-04-01” AND “2025-04-15”

What is the daily spend on the different SAP Databricks products?

This query identifies the consumption generated in DBUs and $DBUs at list price differentiated by SKU on a daily basis.

SELECT
u.usage_date,
u.sku_name,
SUM(u.usage_quantity) AS dbus,
FIRST(lp.pricing.default) AS list_price,
SUM(u.usage_quantity) * FIRST(lp.pricing.default) AS dollar_dbus_list
FROM system.billing.usage u
INNER JOIN system.billing.list_prices lp ON u.sku_name = lp.sku_name
AND u.usage_start_time >= lp.price_start_time
AND (
u.usage_end_time <= lp.price_end_time
or lp.price_end_time is null
)
GROUP BY u.usage_date, u.sku_name
ORDER BY u.usage_date, u.sku_name DESC

What is the spend on the different Databricks products per month per workspace?

This query identifies the consumption generated in DBUs and $DBUs at list price on a monthly basis by every workspace.

SELECT
LEFT(u.usage_date, 7) AS year_month,
u.workspace_id,
u.sku_name,
SUM(u.usage_quantity) AS dbus,
FIRST(lp.pricing.default) AS list_price,
SUM(u.usage_quantity) * FIRST(lp.pricing.default) AS dollar_dbus_list
FROM
system.billing.usage u
INNER JOIN system.billing.list_prices lp ON u.sku_name = lp.sku_name
AND u.usage_start_time >= lp.price_start_time
AND ( u.usage_end_time <= lp.price_end_time OR lp.price_end_time is null)
GROUP BY u.workspace_id, LEFT(u.usage_date, 7), u.sku_name
ORDER BY year_month, u.workspace_id DESC

Suspects for query inefficiency 

The query below shows the potentially most inefficient queries from the last 30 days (based on shuffle).

SELECT
workspace_id,
warehouse_id,
statement_id,
statement_text,
SUM(shuffle_read_bytes) AS shuffle_read_bytes
FROM system.query.history
WHERE start_time between DATE_SUB(CURRENT_DATE, 30) AND CURRENT_DATE
GROUP BY workspace_id, warehouse_id, statement_id, statement_text
HAVING shuffle_read_bytes > 0
ORDER BY shuffle_read_bytes DESC;

Identify expensive notebooks

This query returns a list of notebooks and how many DBUs each notebook consumed, in descending order by DBU consumption

SELECT
usage_metadata.notebook_id,
usage_metadata. notebook_path,
SUM(usage_quantity) as total_dbu
FROM
system.billing.usage
WHERE
usage_metadata.notebook_id is not null
and billing_origin_product = ‘INTERACTIVE’
and product_features.is_serverless
and usage_unit = ‘DBU’
and usage_date >= DATEADD(day, -30, current_date)
GROUP BY
1,2
ORDER BY
total_dbu DESC

Identify expensive jobs

This query returns a list of jobs and how many DBUs each job consumed, in descending order by DBU consumption:

SELECT
usage_metadata.job_id,
usage_metadata.job_name,
SUM(usage_quantity) as total_dbu
FROM
system.billing.usage
WHERE
usage_metadata.job_id is not null
and usage_unit = ‘DBU’
and usage_date >= DATEADD(day, -30, current_date)
and sku_name like ‘%JOBS_SERVERLESS_COMPUTE%’
GROUP BY
1,2
ORDER BY
total_dbu DESC

Report on DBUs consumed by a particular user

This query returns a list of notebooks and jobs run by a particular user or service principal, and the number of DBUs consumed by each workload. Be sure to insert a valid user email address using the parameters.

SELECT
usage_metadata.job_id,
usage_metadata.job_name,
usage_metadata.notebook_id,
usage_metadata. notebook_path,
SUM(usage_quantity) as total_dbu
FROM
system.billing.usage
WHERE
identity_metadata.run_as = format_string(‘%s@%s.%s’, :user_name, :domain_name, :top_level_domain)
and usage_date >= DATEADD(day, -30, current_date)
GROUP BY
1,2,3,4
ORDER BY
total_dbu DESC

Report on DBUs consumed by workloads that share a custom tag

This query returns a list of jobs that share the same custom tag, and the number of DBUs consumed by each workload:

SELECT
usage_metadata.job_id,
usage_metadata.job_name,
usage_metadata.notebook_id,
usage_metadata. notebook_path,
SUM(usage_quantity) as total_dbu
FROM
system.billing.usage
WHERE
custom_tags.<key> = ‘<value>’
and billing_origin_product in (‘JOBS’,’INTERACTIVE’)
and product_features.is_serverless
and usage_unit = ‘DBU’
and usage_date >= DATEADD(day, -30, current_date)
GROUP BY
1,2,3,4
ORDER BY
total_dbu DESC

What are the top 10 tables accessed by user yogananda.muthaiah@sap.com?

SELECT
IFNULL(request_params.full_name_arg, ‘Non-specific’) AS `TABLE ACCESSED`
FROM system.access.audit
WHERE user_identity.email = ‘yogananda.muthaiah@sap.com’
AND action_name IN (‘commandSubmit’,’getTable’)
GROUP BY IFNULL(request_params.full_name_arg, ‘Non-specific’)
ORDER BY COUNT(*) DESC
LIMIT 10

Who are the top 10 users who access table catalog.schema.table?

This query identifies which are the top 10 users accessing table catalog.schema.table.

SELECT
COUNT(*),
user_identity.email
FROM
system.access.audit
WHERE
action_name IN (‘commandSubmit’, ‘getTable’)
AND request_params.full_name_arg = ‘catalog.schema.table’
GROUP BY user_identity.email
ORDER BY COUNT(*) DESC
LIMIT 10

What has this user accessed in the last 24 hours?

This query identifies which tables user yogananda.muthaiah@sap.com accessed in the last 24 hours

SELECT
IFNULL(request_params.full_name_arg, ‘Non-specific’) AS `table_accessed`
FROM system.access.audit
WHERE user_identity.email = ‘yogananda.muthaiah@sap.com’
AND action_name IN (‘commandSubmit’,’getTable’)
AND event_time >= NOW() – ‘1 day’::INTERVAL
GROUP BY IFNULL(request_params.full_name_arg, ‘Non-specific’)
ORDER BY COUNT(*) DESC

 

 

​ All the SQL queries provided are intended for use with SAP Databricks System Tables. Central to this is the fact that System Tables are centrally stored and governed by Unity Catalog. A solid SQL toolkit can make an SAP Databricks admin unstoppable. Here are the queries you’ll rely on the most—fast, essential, and built for real‑world scenarios.A reliable set of SQL queries that help you diagnose issues, validate access, and keep your environment running smoothly. This blog highlights the must‑know queries that every admin should bookmark or pin it.Important Note :Before executing the SQL queries below, please ensure that you have the required permissions and access to the relevant tables. If not, kindly contact your SAP Databricks Administrator in your organization to request the necessary privileges.Resource Monitoring – DB SQL Warehouses that are actively running and for how longThis query helps understanding which warehouses are currently active and their running time in hours. SELECT
we.workspace_id,
we.warehouse_id,
we.event_time,
TIMESTAMPDIFF(MINUTE, we.event_time, CURRENT_TIMESTAMP()) / 60.0 AS running_hours,
we.cluster_count
FROM
system.compute.warehouse_events we
WHERE
we.event_type = ‘RUNNING’
AND NOT EXISTS (
SELECT 1
FROM compute.warehouse_events we2
WHERE we2.warehouse_id = we.warehouse_id
AND we2.event_time > we.event_time
)Resource Monitoring – Warehouses that are upscaled longer than expectedThis query identifies warehouses that have scaled up (increased in cluster count) and have remained in that state longer than usual.SELECT
we.workspace_id,
we.warehouse_id,
we.event_time,
TIMESTAMPDIFF(MINUTE, we.event_time, CURRENT_TIMESTAMP()) / 60.0 AS upscaled_hours,
we.cluster_count
FROM
system.compute.warehouse_events we
WHERE
we.event_type = ‘SCALED_UP’
AND we.cluster_count >= 2
AND NOT EXISTS (
SELECT 1
FROM compute.warehouse_events we2
WHERE we2.warehouse_id = we.warehouse_id
AND (
(we2.event_type = ‘SCALED_DOWN’) OR
(we2.event_type = ‘SCALED_UP’ AND we2.cluster_count < 2)
)
AND we2.event_time > we.event_time
)Join the list_prices table with the usage tableSELECT
SUM(usage.usage_quantity * list_prices.pricing.effective_list.default)
as `Total Dollar Cost`
FROM system.billing.usage
JOIN system.billing.list_prices ON list_prices.sku_name = usage.sku_name
WHERE identity_metadata.run_as = {{‘sample identity’}}
AND usage.usage_end_time >= list_prices.price_start_time
AND (list_prices.price_end_time IS NULL OR usage.usage_end_time < list_prices.price_end_time)
AND usage.usage_date BETWEEN “2025-04-01” AND “2025-04-15″What is the daily spend on the different SAP Databricks products?This query identifies the consumption generated in DBUs and $DBUs at list price differentiated by SKU on a daily basis.SELECT
u.usage_date,
u.sku_name,
SUM(u.usage_quantity) AS dbus,
FIRST(lp.pricing.default) AS list_price,
SUM(u.usage_quantity) * FIRST(lp.pricing.default) AS dollar_dbus_list
FROM system.billing.usage u
INNER JOIN system.billing.list_prices lp ON u.sku_name = lp.sku_name
AND u.usage_start_time >= lp.price_start_time
AND (
u.usage_end_time <= lp.price_end_time
or lp.price_end_time is null
)
GROUP BY u.usage_date, u.sku_name
ORDER BY u.usage_date, u.sku_name DESCWhat is the spend on the different Databricks products per month per workspace?This query identifies the consumption generated in DBUs and $DBUs at list price on a monthly basis by every workspace.SELECT
LEFT(u.usage_date, 7) AS year_month,
u.workspace_id,
u.sku_name,
SUM(u.usage_quantity) AS dbus,
FIRST(lp.pricing.default) AS list_price,
SUM(u.usage_quantity) * FIRST(lp.pricing.default) AS dollar_dbus_list
FROM
system.billing.usage u
INNER JOIN system.billing.list_prices lp ON u.sku_name = lp.sku_name
AND u.usage_start_time >= lp.price_start_time
AND ( u.usage_end_time <= lp.price_end_time OR lp.price_end_time is null)
GROUP BY u.workspace_id, LEFT(u.usage_date, 7), u.sku_name
ORDER BY year_month, u.workspace_id DESCSuspects for query inefficiency The query below shows the potentially most inefficient queries from the last 30 days (based on shuffle).SELECT
workspace_id,
warehouse_id,
statement_id,
statement_text,
SUM(shuffle_read_bytes) AS shuffle_read_bytes
FROM system.query.history
WHERE start_time between DATE_SUB(CURRENT_DATE, 30) AND CURRENT_DATE
GROUP BY workspace_id, warehouse_id, statement_id, statement_text
HAVING shuffle_read_bytes > 0
ORDER BY shuffle_read_bytes DESC;Identify expensive notebooksThis query returns a list of notebooks and how many DBUs each notebook consumed, in descending order by DBU consumptionSELECT
usage_metadata.notebook_id,
usage_metadata. notebook_path,
SUM(usage_quantity) as total_dbu
FROM
system.billing.usage
WHERE
usage_metadata.notebook_id is not null
and billing_origin_product = ‘INTERACTIVE’
and product_features.is_serverless
and usage_unit = ‘DBU’
and usage_date >= DATEADD(day, -30, current_date)
GROUP BY
1,2
ORDER BY
total_dbu DESCIdentify expensive jobsThis query returns a list of jobs and how many DBUs each job consumed, in descending order by DBU consumption:SELECT
usage_metadata.job_id,
usage_metadata.job_name,
SUM(usage_quantity) as total_dbu
FROM
system.billing.usage
WHERE
usage_metadata.job_id is not null
and usage_unit = ‘DBU’
and usage_date >= DATEADD(day, -30, current_date)
and sku_name like ‘%JOBS_SERVERLESS_COMPUTE%’
GROUP BY
1,2
ORDER BY
total_dbu DESCReport on DBUs consumed by a particular userThis query returns a list of notebooks and jobs run by a particular user or service principal, and the number of DBUs consumed by each workload. Be sure to insert a valid user email address using the parameters.SELECT
usage_metadata.job_id,
usage_metadata.job_name,
usage_metadata.notebook_id,
usage_metadata. notebook_path,
SUM(usage_quantity) as total_dbu
FROM
system.billing.usage
WHERE
identity_metadata.run_as = format_string(‘%s@%s.%s’, :user_name, :domain_name, :top_level_domain)
and usage_date >= DATEADD(day, -30, current_date)
GROUP BY
1,2,3,4
ORDER BY
total_dbu DESCReport on DBUs consumed by workloads that share a custom tagThis query returns a list of jobs that share the same custom tag, and the number of DBUs consumed by each workload:SELECT
usage_metadata.job_id,
usage_metadata.job_name,
usage_metadata.notebook_id,
usage_metadata. notebook_path,
SUM(usage_quantity) as total_dbu
FROM
system.billing.usage
WHERE
custom_tags.<key> = ‘<value>’
and billing_origin_product in (‘JOBS’,’INTERACTIVE’)
and product_features.is_serverless
and usage_unit = ‘DBU’
and usage_date >= DATEADD(day, -30, current_date)
GROUP BY
1,2,3,4
ORDER BY
total_dbu DESCWhat are the top 10 tables accessed by user yogananda.muthaiah@sap.com?SELECT
IFNULL(request_params.full_name_arg, ‘Non-specific’) AS `TABLE ACCESSED`
FROM system.access.audit
WHERE user_identity.email = ‘yogananda.muthaiah@sap.com’
AND action_name IN (‘commandSubmit’,’getTable’)
GROUP BY IFNULL(request_params.full_name_arg, ‘Non-specific’)
ORDER BY COUNT(*) DESC
LIMIT 10Who are the top 10 users who access table catalog.schema.table?This query identifies which are the top 10 users accessing table catalog.schema.table.SELECT
COUNT(*),
user_identity.email
FROM
system.access.audit
WHERE
action_name IN (‘commandSubmit’, ‘getTable’)
AND request_params.full_name_arg = ‘catalog.schema.table’
GROUP BY user_identity.email
ORDER BY COUNT(*) DESC
LIMIT 10What has this user accessed in the last 24 hours?This query identifies which tables user yogananda.muthaiah@sap.com accessed in the last 24 hoursSELECT
IFNULL(request_params.full_name_arg, ‘Non-specific’) AS `table_accessed`
FROM system.access.audit
WHERE user_identity.email = ‘yogananda.muthaiah@sap.com’
AND action_name IN (‘commandSubmit’,’getTable’)
AND event_time >= NOW() – ‘1 day’::INTERVAL
GROUP BY IFNULL(request_params.full_name_arg, ‘Non-specific’)
ORDER BY COUNT(*) DESC    Read More Technology Blog Posts by SAP articles 

#SAP

#SAPTechnologyblog

You May Also Like

More From Author