Db2 LUW Intra-Partition Parallelism: Tackling Latch Waits in Highly Selective Queries

Motivation

Have you ever encountered a simple, highly selective query that should return quickly, but instead causes slowdowns and high wait times in an SAP on Db2 for Linux, UNIX, and Windows (LUW) environment?

This blog post explores a scenario where highly selective SQL statements that use intra-partition parallelism (IPPs) can suffer from severe overhead, particularly in the form of latch waits. You’ll learn to identify this problematic pattern, understand its root cause and discover ways to resolve this issue. I’ll give you an example from an SAP Landscape Transformation (SLT) environment, but the principle applies to any system running similar high-frequency, selective lookups.

Background

Intra-partition parallelism (IPP) is beneficial for complex analytical queries that scan millions of rows. This feature allows the database to break a single SQL query into smaller pieces and execute them simultaneously using multiple parallel subagents.

But how do these agents combine their results?

This is where Table Queues (TQs) come in. A TQ is a shared memory area that acts like a temporary pipeline. Subagents scan the data and push rows that match the query’s conditions into the TQ. The coordinator agent pulls the rows from the TQ to assemble the final result. Since multiple agents access this shared pipeline simultaneously, this process requires coordination. Latches provide this coordination, ensuring only one agent modifies the TQ at a time.

Latches are very fast, short-term locks that Db2 uses to protect its own internal memory structures. Do not confuse this with a database lock, whose purpose is to protect user data in tables. When a task needs to read or write to an internal shared structure, it takes a latch, performs its work, and releases it very quickly. If many tasks try to get the same latch at the same time, they form a queue, and the waiting time can add up.

Symptoms

The system suffered from unstable performance and severe slowdowns. The monitoring views in the DBA Cockpit (transaction DBACOCKPIT) revealed the following symptoms:

High Latch Wait Time (transaction DBACOCKPIT -> Performance -> Time Spent Analysis)

Over 85% of the database’s time was spent waiting on latches.

Specific Latch Contention (transaction DBACOCKPIT -> Diagnostics -> Extended Latch Waits)

The following three latches were by far the most expensive ones:

sqkfSessionLockManager and sqkfRecvLockManager (Protect the TQ)SQLB_BPD__bpdLatch_SX (Protects the Buffer Pool’s metadata. High contention means many parallel agents are competing to access data pages from memory, creating a bottleneck in the query’s data reading phase.)

Problematic Queries (transaction DBACOCKPIT -> Performance -> Top SQL Statements – sorted by Total Extended Latch Wait Time)

SQL statements, responsible for the vast majority of the wait time, followed a distinct pattern:

They used a highly selective WHERE clause to find a specific record.They included FETCH FIRST 1 ROW ONLY (or LIMIT 1) to retrieve only that record.Parallelism and TQs were used in the access plan.

This pattern is common in job-polling or status-tracking applications. In our specific case, it was an SAP Landscape Transformation (SLT) system constantly polling its control tables (IUUC_RS_STATUS, IUUC_RS_ORDER) to find the latest status record, marked by the flag IS_LAST = ‘X’.

Root Cause

The application is designed to perform a simple and quick lookup and return one specific row. However, the database access plan told a different story:

Statement:

The IUUC_RS_STATUS table is a log used by SLT to track the replication status of a specific source table. Every time the status changes (e.g., replication starts, a load finishes, or an error occurs), SLT inserts a new row into this table with a timestamp. The IS_LAST = ‘X’ flag is used to mark the single, most recent, and currently active status record. This is why SLT constantly queries for the row with IS_LAST = ‘X’ and LIMIT 1.

Access Plan:

Used Index:

The index IUUC_RS_STATUS-0 on the table IUUC_RS_STATUS did not include the IS_LAST column. Since the index wasn’t a good fit, the Db2 optimizer made a decision: “I can’t find this row with a single index lookup. Instead, I’ll use multiple parallel agents (intra-partition parallelism) to scan the index faster and find it.”

This led to the following normal and expected process:

Db2 establishes the coordinator and subagents and opens the necessary TQ for the parallel sections.Each subagent scans a portion of the index, looking for rows that match the WHERE clause.When a subagents finds a matching row, it tries to push it into the shared TQ.The coordinator agent pulls the first row from the TQ and stops because of LIMIT 1.

However, even though the LIMIT 1 clause allows the coordinator to terminate after retrieving the first qualifying row, the overhead associated with initiating parallel sub-agents and managing the TQ with latches outweighed the benefits for a simple task like finding a single row. While the parallel read performance provided by intra-partition parallelism is advantageous for large analytical queries, it is not beneficial in this context.

In short, for these highly selective lookups, parallelization was counterproductive. The subagents ended up spending more time waiting for latches than performing actual work, making the query, which was expected to be quick, slow and unpredictable.

Fix

In this case, it is crucial that the estimated size of the intermediate result becomes so small that the optimizer recognizes that parallel processing is not beneficial. This example shows that parallelization is not always advantageous, can make the environment more complex and potentially lead to optimizer misjudgments.

The recommended resolution approach is to add a suitable index that covers all columns of the WHERE clause. In this example, the IS_LAST column was missing. This changed the access plan fundamentally:

New index:

IUUC_RS_STATUS on (MT_ID, TABLE_NAME, IS_LAST)

The effect of this change is that the predicate becomes fully covered by the index columns. This allows Db2 to perform a simple, single-threaded index probe and stop on the first match. Therefore, there is no longer a need for TQs or IPP, and the latch hotspot disappears. The SQL access plan shows a significant decrease in estimated costs, from 2,335E+03 timerons initially to 1,379E+01 timerons after the optimization. This corresponds to a reduction by a factor of approximately 170.

Alternative solutions

Use optimization guidelines

These are special instructions (hints) that you can add directly to an SQL statement. They give the database targeted suggestions on how to run a query.To disable parallelism for a single query, you can use the instruction <DEGREE VALUE=’1′>. This guarantees that the database will run the statement serially.With these guidelines, you can also influence other behaviors like forcing a specific index. This can also prevent parallelism, but it’s only a side effect and isn’t guaranteed.For more information, see SAP Note 868888 – DB6: Use of DB2 Optimization Guidelines

Disable parallelism globally

Set the DB CFG parameter “DFT_DEGREE” to 1 (sets the default degree of parallelism for all statements to 1).
Alternatively, set the DBM CFG parameter “INTRA_PARALLEL” to ‘NO’ (disables intra-query parallelism at the instance level).All queries across all sessions will be executed without parallelism and there is no need to modify individual statements. However, this affects all workloads, including those that benefit from parallelism. Changing “DFT_DEGREE“ or “INTRA_PARALLEL“ requires a database deactivation/activation.For more information, see Parallel processing for applications and SAP Note 2047006 – DB6: Use of Db2 SMP Parallelism (INTRA_PARALLEL=YES)

Verification and Monitoring

Inspect the execution plan of affected statements:

In the DBA Cockpit (transaction DBACOCKPIT – SQL Cache), check the EXPLAIN plan or use MON_GET_PKG_CACHE_STMT. Look for TQs and the query degree. After adding the new index or disabling the parallelism, the plan should be a straightforward index probe with no TQs.

Compare waits before/after:

To observe the total extended latch wait time, inspect the DBA Cockpit (transaction DBACOCKPIT – Time Spent Analysis). For latch types and agent behavior, use the DBA Cockpit (transaction DBACOCKPIT – Extended Latch Waits) or db2pd -latches and db2pd -agents.

Conclusion

Intra-partition parallelism (IPP) can significantly boost the performance of analytics and batch workloads. However, it may also result in costly access plans and increased latch wait times, especially with highly selective queries, such as those common in SAP Landscape Transformation (SLT) setups, if these queries lack proper index support. Keep in mind that this issue is not exclusive to SLT queries and can occur in various scenarios. The optimizer may choose to use IPP with latch-protected Table Queues (TQs), which lead to significant overhead in this specific situation. For these selective queries, the startup and coordination cost of TQs can dominate the processing time, resulting in substantial delays.

By creating a targeted index that includes relevant columns (e.g., the IS_LAST column for SLT logging tables), Db2 can perform a single-threaded index probe and return the desired row very fast. This eliminates the latch hotspot without compromising overall system parallelism. While disabling parallelism is an alternative solution, creating the appropriate index is a more efficient and targeted approach.

Have you used intra-partition parallelism in your system? Share your thoughts and experiences in the comments.

MotivationHave you ever encountered a simple, highly selective query that should return quickly, but instead causes slowdowns and high wait times in an SAP on Db2 for Linux, UNIX, and Windows (LUW) environment?This blog post explores a scenario where highly selective SQL statements that use intra-partition parallelism (IPPs) can suffer from severe overhead, particularly in the form of latch waits. You’ll learn to identify this problematic pattern, understand its root cause and discover ways to resolve this issue. I’ll give you an example from an SAP Landscape Transformation (SLT) environment, but the principle applies to any system running similar high-frequency, selective lookups.BackgroundIntra-partition parallelism (IPP) is beneficial for complex analytical queries that scan millions of rows. This feature allows the database to break a single SQL query into smaller pieces and execute them simultaneously using multiple parallel subagents.But how do these agents combine their results?This is where Table Queues (TQs) come in. A TQ is a shared memory area that acts like a temporary pipeline. Subagents scan the data and push rows that match the query’s conditions into the TQ. The coordinator agent pulls the rows from the TQ to assemble the final result. Since multiple agents access this shared pipeline simultaneously, this process requires coordination. Latches provide this coordination, ensuring only one agent modifies the TQ at a time.Latches are very fast, short-term locks that Db2 uses to protect its own internal memory structures. Do not confuse this with a database lock, whose purpose is to protect user data in tables. When a task needs to read or write to an internal shared structure, it takes a latch, performs its work, and releases it very quickly. If many tasks try to get the same latch at the same time, they form a queue, and the waiting time can add up.SymptomsThe system suffered from unstable performance and severe slowdowns. The monitoring views in the DBA Cockpit (transaction DBACOCKPIT) revealed the following symptoms:High Latch Wait Time (transaction DBACOCKPIT -> Performance -> Time Spent Analysis) Over 85% of the database’s time was spent waiting on latches. Specific Latch Contention (transaction DBACOCKPIT -> Diagnostics -> Extended Latch Waits)The following three latches were by far the most expensive ones:sqkfSessionLockManager and sqkfRecvLockManager (Protect the TQ)SQLB_BPD__bpdLatch_SX (Protects the Buffer Pool’s metadata. High contention means many parallel agents are competing to access data pages from memory, creating a bottleneck in the query’s data reading phase.) Problematic Queries (transaction DBACOCKPIT -> Performance -> Top SQL Statements – sorted by Total Extended Latch Wait Time)SQL statements, responsible for the vast majority of the wait time, followed a distinct pattern:They used a highly selective WHERE clause to find a specific record.They included FETCH FIRST 1 ROW ONLY (or LIMIT 1) to retrieve only that record.Parallelism and TQs were used in the access plan.This pattern is common in job-polling or status-tracking applications. In our specific case, it was an SAP Landscape Transformation (SLT) system constantly polling its control tables (IUUC_RS_STATUS, IUUC_RS_ORDER) to find the latest status record, marked by the flag IS_LAST = ‘X’.Root Cause The application is designed to perform a simple and quick lookup and return one specific row. However, the database access plan told a different story:Statement: The IUUC_RS_STATUS table is a log used by SLT to track the replication status of a specific source table. Every time the status changes (e.g., replication starts, a load finishes, or an error occurs), SLT inserts a new row into this table with a timestamp. The IS_LAST = ‘X’ flag is used to mark the single, most recent, and currently active status record. This is why SLT constantly queries for the row with IS_LAST = ‘X’ and LIMIT 1.Access Plan: Used Index: The index IUUC_RS_STATUS-0 on the table IUUC_RS_STATUS did not include the IS_LAST column. Since the index wasn’t a good fit, the Db2 optimizer made a decision: “I can’t find this row with a single index lookup. Instead, I’ll use multiple parallel agents (intra-partition parallelism) to scan the index faster and find it.”This led to the following normal and expected process:Db2 establishes the coordinator and subagents and opens the necessary TQ for the parallel sections.Each subagent scans a portion of the index, looking for rows that match the WHERE clause.When a subagents finds a matching row, it tries to push it into the shared TQ.The coordinator agent pulls the first row from the TQ and stops because of LIMIT 1.However, even though the LIMIT 1 clause allows the coordinator to terminate after retrieving the first qualifying row, the overhead associated with initiating parallel sub-agents and managing the TQ with latches outweighed the benefits for a simple task like finding a single row. While the parallel read performance provided by intra-partition parallelism is advantageous for large analytical queries, it is not beneficial in this context.In short, for these highly selective lookups, parallelization was counterproductive. The subagents ended up spending more time waiting for latches than performing actual work, making the query, which was expected to be quick, slow and unpredictable.FixIn this case, it is crucial that the estimated size of the intermediate result becomes so small that the optimizer recognizes that parallel processing is not beneficial. This example shows that parallelization is not always advantageous, can make the environment more complex and potentially lead to optimizer misjudgments.The recommended resolution approach is to add a suitable index that covers all columns of the WHERE clause. In this example, the IS_LAST column was missing. This changed the access plan fundamentally:New index: IUUC_RS_STATUS on (MT_ID, TABLE_NAME, IS_LAST)The effect of this change is that the predicate becomes fully covered by the index columns. This allows Db2 to perform a simple, single-threaded index probe and stop on the first match. Therefore, there is no longer a need for TQs or IPP, and the latch hotspot disappears. The SQL access plan shows a significant decrease in estimated costs, from 2,335E+03 timerons initially to 1,379E+01 timerons after the optimization. This corresponds to a reduction by a factor of approximately 170.Alternative solutions Use optimization guidelinesThese are special instructions (hints) that you can add directly to an SQL statement. They give the database targeted suggestions on how to run a query.To disable parallelism for a single query, you can use the instruction <DEGREE VALUE=’1′>. This guarantees that the database will run the statement serially.With these guidelines, you can also influence other behaviors like forcing a specific index. This can also prevent parallelism, but it’s only a side effect and isn’t guaranteed.For more information, see SAP Note 868888 – DB6: Use of DB2 Optimization GuidelinesDisable parallelism globallySet the DB CFG parameter “DFT_DEGREE” to 1 (sets the default degree of parallelism for all statements to 1).Alternatively, set the DBM CFG parameter “INTRA_PARALLEL” to ‘NO’ (disables intra-query parallelism at the instance level).All queries across all sessions will be executed without parallelism and there is no need to modify individual statements. However, this affects all workloads, including those that benefit from parallelism. Changing “DFT_DEGREE“ or “INTRA_PARALLEL“ requires a database deactivation/activation.For more information, see Parallel processing for applications and SAP Note 2047006 – DB6: Use of Db2 SMP Parallelism (INTRA_PARALLEL=YES)Verification and Monitoring Inspect the execution plan of affected statements:In the DBA Cockpit (transaction DBACOCKPIT – SQL Cache), check the EXPLAIN plan or use MON_GET_PKG_CACHE_STMT. Look for TQs and the query degree. After adding the new index or disabling the parallelism, the plan should be a straightforward index probe with no TQs.Compare waits before/after:To observe the total extended latch wait time, inspect the DBA Cockpit (transaction DBACOCKPIT – Time Spent Analysis). For latch types and agent behavior, use the DBA Cockpit (transaction DBACOCKPIT – Extended Latch Waits) or db2pd -latches and db2pd -agents.ConclusionIntra-partition parallelism (IPP) can significantly boost the performance of analytics and batch workloads. However, it may also result in costly access plans and increased latch wait times, especially with highly selective queries, such as those common in SAP Landscape Transformation (SLT) setups, if these queries lack proper index support. Keep in mind that this issue is not exclusive to SLT queries and can occur in various scenarios. The optimizer may choose to use IPP with latch-protected Table Queues (TQs), which lead to significant overhead in this specific situation. For these selective queries, the startup and coordination cost of TQs can dominate the processing time, resulting in substantial delays.By creating a targeted index that includes relevant columns (e.g., the IS_LAST column for SLT logging tables), Db2 can perform a single-threaded index probe and return the desired row very fast. This eliminates the latch hotspot without compromising overall system parallelism. While disabling parallelism is an alternative solution, creating the appropriate index is a more efficient and targeted approach.Have you used intra-partition parallelism in your system? Share your thoughts and experiences in the comments. Read More Technology Blog Posts by SAP articles

#SAP

#SAPTechnologyblog