Introduction

Testing of High Availability Clusters used for managing SAP S/4HANA application servers is critical for ensuring their reliability and resilience. Many issues can be avoided in the first place during the pre-go-live stage by carrying out certain tests.

While manual verification of an HA cluster setup is possible, these tests can also be automated using Ansible. Developing custom roles and playbooks allows for a modular, reusable, and scalable design to automate each test case. This approach is particularly effective for tests like simulating a node failure to verify failover and resource migration.

Manual testing becomes tedious, time-consuming, and error-prone, especially in HA cluster environments where multiple nodes must be constantly observed during every test.

What is the Automated Test suite?

This Automated Test suite boosts quality assurance efficiency, test coverage, and accelerates bug reproduction/verification and post-maintenance, leading to faster production-ready RHEL HA cluster environments for ENSA 2.

The Automated Test suite requires a dedicated, out-of-cluster RHEL Ansible Control node to execute and monitor tests—including failovers and node crashes—on the SAP HA cluster. Using Ansible makes the tests transparent and easy to understand, and its built-in plugins allow saving logs for future review and audits.

Which tests are performed?

This automation test suite also covers the tests for the SAP HA Interface for SAP application servers. Each test carries out specific tasks to verify the cluster configuration and current state, conducts the actual test, verifies the output, and compares it with the expected results.

The following table outlines what the test suite actually verifies on the systems being tested:

Test Name

Test Title

Minimum no. of HA Nodes

Test01

Name and version of HA software

Test02

HA configuration showing no errors

Test03

Shared Library (HA-Interface) loads without any errors

Test04

Manual move of ASCS works correctly with lock data

Test05

Irrecoverable outage of the Enqueue Server (ES) 2 is handled correctly

Test06

Outage of the Enqueue Replicator (ER) 2 is handled correctly

Test07

ASCS moves correctly under load without specifying the destination node (Skipped in the current version)

Test08

ASCS moves correctly in case of hardware or OS failure (Note: Node crash execution)

Test09

Recoverable outage of the Message Server is handled correctly (if the SAP Profile Parameter “Restart_Program” is used for the Message Server) (Note: Backup and Auto-config)

Test10

Irrecoverable outage of the Message Server is handled correctly (if the SAP Profile Parameter “Start_Program” is used for the Message Server) (Note: Backup and Auto-config)

Please note that test07 is skipped in the current version and may be implemented and updated in future releases. Additional test cases are planned for inclusion in future releases to enhance coverage.

Prerequisites

1. A minimum of 2 node cluster configuration with ASCS and ERS resources to start with the initial tests, but 3 nodes are recommended.
Refer to the following document for the guidelines to configure the cluster that can be tested with the Ansible playbooks described in this blog: Configuring HA clusters to manage SAP NetWeaver or SAP S/4HANA Application server instances using the RHEL HA Add-On | Red Hat Enterprise Linux for SAP Solutions (Only ENSA2). Your pacemaker cluster should look like the following:

[root@s4node01: ~]# pcs cluster status
Cluster Status:
…..
Node List:
* Online: [ s4node01 s4node02 s4node03 ]

PCSD Status:
s4node01: Online
s4node03: Online
s4node02: Online

[root@s4node01: ~]# pcs resource status
……
* Resource Group: s4h_ASCS20_group:
* s4h_lvm_ascs20 (ocf:heartbeat:LVM-activate): Started s4node03
* s4h_fs_ascs20 (ocf:heartbeat:Filesystem): Started s4node03
* s4h_vip_ascs20 (ocf:heartbeat:IPaddr2): Started s4node03
* s4h_ascs20 (ocf:heartbeat:SAPInstance): Started s4node03
* Resource Group: s4h_ERS29_group:
* s4h_lvm_ers29 (ocf:heartbeat:LVM-activate): Started s4node01
* s4h_fs_ers29 (ocf:heartbeat:Filesystem): Started s4node01
* s4h_vip_ers29 (ocf:heartbeat:IPaddr2): Started s4node01
* s4h_ers29 (ocf:heartbeat:SAPInstance): Started s4node01

This is also a precondition before running any test.

2. Ensure that the SAP HA Interface for SAP ABAP application server instances is configured as mentioned here: How to enable the SAP HA Interface for SAP ABAP application server instances managed by the RHEL HA Add-On? – Red Hat Customer Portal.

3. Ensure that the sap.sap_operations collection for host_info and pcs_status_info module is installed on the ansible control node.

Getting started

1. Clone the community.sap_ha_cluster_qa repository and enter into that directory

# git clone https://github.com/sap-linuxlab/community.sap_ha_cluster_qa.git
# cd community.sap_ha_cluster_qa.git

2. Verify that the ansible.cfg, inventory and playbook files match your specific Ansible environment

3. Make sure that the inventory file contains at least the hostnames of all the reachable cluster nodes that you want to test:

For example:

# cat tests/inventory/x86_64.yml

—
all:
children:
s4hana-3n:
hosts:
s4node01:
s4node02:
s4node03:

4. Ensure that the nodes are reachable via ansible ping, for example:

# ansible all -m ping -i tests/inventory/x86_64.yml
s4node01 | SUCCESS => {
“ansible_facts”: {
“discovered_interpreter_python”: “/usr/bin/python3”
},
“changed”: false,
“ping”: “pong”
}
s4node02 | SUCCESS => {
“ansible_facts”: {
“discovered_interpreter_python”: “/usr/bin/python3”
},
“changed”: false,
“ping”: “pong”
}
s4node03 | SUCCESS => {
“ansible_facts”: {
“discovered_interpreter_python”: “/usr/bin/python3”
},
“changed”: false,
“ping”: “pong”
}

How to run the tests

1. While in the community.sap_ha_cluster_qa directory run the playbook as follows for test01 to verify name and version of HA software.

# ansible-playbook -i tests/inventory/x86_64.yml ansible_collections/sap/cluster_qa/playbooks/test01.yml -v

PLAY [Playbook to run test01 test case on ASCS and ERS instances] ***************************************************************************

TASK [Collect necessary gather_facts] ***************************************************************************
ok: [s4node01]
ok: [s4node03]
ok: [s4node02]

……

TASK [sap.cluster_qa.test01 : Print test results completing test01 test case for current instance] ***************************************************************************
ok: [s4node02] => {
“msg”: {
“changed”: false,
“failed”: false,
“ha_get_failoverconfig_info”: {
“HAActive”: true,
“HAActiveNode”: “s4node02”,
“HADocumentation”: “https://github.com/ClusterLabs/sap_cluster_connector“,
“HANodes”: “”,
“HAProductVersion”: “Pacemaker”,
“HASAPInterfaceVersion”: “sap_cluster_connector”
}
}
}

……

TASK [sap.cluster_qa.test01 : Print test results completing test01 test case for current instance] ***************************************************************************
ok: [s4node03] => {
“msg”: {
“changed”: false,
“failed”: false,
“ha_get_failoverconfig_info”: {
“HAActive”: true,
“HAActiveNode”: “s4node03”,
“HADocumentation”: “https://github.com/ClusterLabs/sap_cluster_connector“,
“HANodes”: “”,
“HAProductVersion”: “Pacemaker”,
“HASAPInterfaceVersion”: “sap_cluster_connector”
}
}
}

2. Similarly, you can run playbooks for the next test case by replacing the playbook. In this case test02.yml to verify HA configuration shows no errors

# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test02.yml -v

………
TASK [sap.cluster_qa.test02 : Print the results completing TEST02 test case for ERS node] ***************************************************************************
skipping: [s4node02] => {}
skipping: [s4node03] => {}
ok: [s4node01] => {
“msg”: {
“changed”: false,
“failed”: false,
“ha_check_config_info”: [
{
“category”: “SAPControl-SAP-CONFIGURATION”,
“comment”: “0 ABAP instances detected”,
“description”: “Redundant ABAP instance configuration”,
“state”: “SAPControl-HA-SUCCESS”
},
{
“category”: “SAPControl-SAP-CONFIGURATION”,
“comment”: “All Enqueue server separated from application server”,
“description”: “Enqueue separation”,
“state”: “SAPControl-HA-SUCCESS”
},
{
“category”: “SAPControl-SAP-CONFIGURATION”,
“comment”: “All MessageServer separated from application server”,
“description”: “MessageServer separation”,
“state”: “SAPControl-HA-SUCCESS”
},
{
“category”: “SAPControl-SAP-STATE”,
“comment”: “SCS instance status ok”,
“description”: “SCS instance running”,
“state”: “SAPControl-HA-SUCCESS”
},
{
“category”: “SAPControl-SAP-CONFIGURATION”,
“comment”: “SAPInstance includes is-ers patch”,
“description”: “SAPInstance RA sufficient version (s4ascs_S4H_20)”,
“state”: “SAPControl-HA-SUCCESS”
},
{
“category”: “SAPControl-SAP-CONFIGURATION”,
“comment”: “Enqueue replication enabled”,
“description”: “Enqueue replication (s4ascs_S4H_20)”,
“state”: “SAPControl-HA-SUCCESS”
},
{
“category”: “SAPControl-SAP-STATE”,
“comment”: “Enqueue replication active”,
“description”: “Enqueue replication state (s4ascs_S4H_20)”,
“state”: “SAPControl-HA-SUCCESS”
},
{
“category”: “SAPControl-SAP-CONFIGURATION”,
“comment”: “SAPInstance includes is-ers patch”,
“description”: “SAPInstance RA sufficient version (s4ers_S4H_29)”,
“state”: “SAPControl-HA-SUCCESS”
}
]
}
}

PLAY RECAP ***************************************************************************
s4node01 : ok=32 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
s4node02 : ok=30 changed=0 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0
s4node03 : ok=32 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0

3. Similarly, you can run the tests for each test case by replacing the command with the corresponding test03 playbook as shown below. Test03 verifies the shared Library (HA-Interface) is being loaded without any errors.

# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test03.yml -v
…….
TASK [sap.cluster_qa.test03 : Printing SAP HA trace logs for ERS instance] ***************************************************************************
ok: [s4node01] => {
“msg”: [
“SAP HA Trace: Thu Jan 22 15:59:09 2026”,
“SAP HA Trace: — SAP_HA_FindSAPInstance Exit-Code: SAP_HA_OK —“,
“SAP HA Trace: Thu Jan 22 15:59:09 2026”,
“SAP HA Trace: === SAP_HA_StartCluster ===”,
“SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector cpa …”,
“SAP HA Trace: SAP_HA_StartCluster: FOUND PENDING ACTION -> SAP_HA_START_IN_PROGRESS”,
“SAP HA Trace: Thu Jan 22 15:59:09 2026”,
“SAP HA Trace: — SAP_HA_StartCluster Exit-Code: SAP_HA_START_IN_PROGRESS —“,
“SAP HA Trace: Mon Jan 26 13:35:13 2026”,
“SAP HA Trace: === SAP_HA_FindSAPInstance ===”,
“SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector lsr …”,
“SAP HA Trace: searchClusterFile: S4H:29 found”,
“SAP HA Trace: Mon Jan 26 13:35:13 2026”,
“SAP HA Trace: — SAP_HA_FindSAPInstance Exit-Code: SAP_HA_OK —“,
“SAP HA Trace: Mon Jan 26 13:35:13 2026”,
“SAP HA Trace: === SAP_HA_StopCluster ===”,
“SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector cpa …”,
“SAP HA Trace: SAP_HA_StopCluster: FOUND PENDING ACTION -> SAP_HA_STOP_IN_PROGRESS”,
“SAP HA Trace: Mon Jan 26 13:35:13 2026”,
“SAP HA Trace: — SAP_HA_StopCluster Exit-Code: SAP_HA_STOP_IN_PROGRESS —“,
“SAP HA Trace: Mon Jan 26 13:35:17 2026”,
“SAP HA Trace: === SAP_HA_FindSAPInstance ===”,
“SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector lsr …”,
“SAP HA Trace: searchClusterFile: S4H:29 found”,
“SAP HA Trace: Mon Jan 26 13:35:17 2026”,
“SAP HA Trace: — SAP_HA_FindSAPInstance Exit-Code: SAP_HA_OK —“,
“SAP HA Trace: Mon Jan 26 13:35:17 2026”,
“SAP HA Trace: === SAP_HA_StartCluster ===”,
“SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector cpa …”,
“SAP HA Trace: SAP_HA_StartCluster: FOUND PENDING ACTION -> SAP_HA_START_IN_PROGRESS”,
“SAP HA Trace: Mon Jan 26 13:35:17 2026”,
“SAP HA Trace: — SAP_HA_StartCluster Exit-Code: SAP_HA_START_IN_PROGRESS —“,
“SAP HA Trace: Mon Jan 26 13:36:20 2026”,
“SAP HA Trace: === SAP_HA_CheckConfig ===”,
“SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector hcc …”,
“SAP HA Trace: Mon Jan 26 13:36:20 2026”,
“SAP HA Trace: — SAP_HA_CheckConfig Exit-Code: SAP_HA_OK —“,
“SAP HA Trace: Mon Jan 26 13:36:20 2026”,
“SAP HA Trace: === SAP_HA_FreeConfigCheck ===”,
“SAP HA Trace: Mon Jan 26 13:36:20 2026”,
“SAP HA Trace: — SAP_HA_FreeConfigCheck Exit-Code: SAP_HA_OK —“,
“SAP HA Trace: Mon Jan 26 13:36:21 2026”,
“SAP HA Trace: === SAP_HA_CheckConfig ===”,
“SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector hcc …”,
“SAP HA Trace: Mon Jan 26 13:36:21 2026”,
“SAP HA Trace: — SAP_HA_CheckConfig Exit-Code: SAP_HA_OK —“,
“SAP HA Trace: Mon Jan 26 13:36:21 2026”,
“SAP HA Trace: === SAP_HA_FreeConfigCheck ===”,
“SAP HA Trace: Mon Jan 26 13:36:21 2026”,
“SAP HA Trace: — SAP_HA_FreeConfigCheck Exit-Code: SAP_HA_OK —“
]
}
skipping: [s4node02] => {}
skipping: [s4node03] => {}

PLAY RECAP ***************************************************************************
s4node01 : ok=32 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
s4node02 : ok=30 changed=0 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0
s4node03 : ok=32 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0

4. Test04 for verifying manual ASCS move:

# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test04.yml -v
……

TASK [sap.cluster_qa.test04 : Asserting the locks of ASCS and ERS after move completing the TEST04 Test Case] ***************************************************************************

ok: [s4node01] => {

“changed”: false,

“msg”: “All assertions passed”

}

ok: [s4node02] => {

“changed”: false,

“msg”: “All assertions passed”

}

ok: [s4node03] => {

“changed”: false,

“msg”: “All assertions passed”

}

PLAY RECAP ***************************************************************************

s4node01 : ok=69 changed=1 unreachable=0 failed=0 skipped=8 rescued=0 ignored=0

s4node02 : ok=66 changed=0 unreachable=0 failed=0 skipped=10 rescued=0 ignored=0

s4node03 : ok=70 changed=2 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0

Lock table data is compared and asserted 3 times in this test run: ASCS and ERS before move, ASCS before and after move, ASCS and ERS after move.

5. Test05 for verifying irrecoverable outage of the Enqueue Server (ES) 2 is handled correctly.

# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test05.yml -v
……

TASK [sap.cluster_qa.test05 : Verifying ASCS not on the same node] ***************************************************************************

ok: [s4node01] => {

“changed”: false,

“msg”: “All assertions passed”

}

ok: [s4node02] => {

“changed”: false,

“msg”: “All assertions passed”

}

ok: [s4node03] => {

“changed”: false,

“msg”: “All assertions passed”

}

PLAY RECAP ***************************************************************************

s4node01 : ok=38 changed=1 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0

s4node02 : ok=38 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

s4node03 : ok=36 changed=1 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0

6. Test06 for verifying outage of the Enqueue Replicator 2 is handled correctly

# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test06.yml -v
……
TASK [sap.cluster_qa.test06 : Verifying ERS not on the same node as ASCS] ***************************************************************************
ok: [s4node01] => {
“changed”: false,
“msg”: “ERS successfully moved to s4node01, different from ASCS node s4node02”
}
ok: [s4node02] => {
“changed”: false,
“msg”: “ERS successfully moved to s4node01, different from ASCS node s4node02”
}
ok: [s4node03] => {
“changed”: false,
“msg”: “ERS successfully moved to s4node01, different from ASCS node s4node02”
}

PLAY RECAP ***************************************************************************
s4node01 : ok=53 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
s4node02 : ok=49 changed=1 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
s4node03 : ok=49 changed=1 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0

7. Test07 (skipped in the current version)

8. Test08 to verify ASCS moves correctly in case of hardware or OS failure. Please note that in this test the primary ASCS node will be crashed using the “echo c > /proc/sysrq-trigger” command.

# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test08.yml -v
……

TASK [sap.cluster_qa.test08 : Verifying ASCS not on the same node] ***************************************************************************

ok: [s4node01] => {

“changed”: false,

“msg”: “ASCS successfully moved from s4node03 to s4node02”

}

PLAY RECAP ***************************************************************************

s4node01 : ok=40 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

s4node02 : ok=33 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

s4node03 : ok=18 changed=1 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0

9. Test09: Recoverable outage of the Message Server is handled correctly (if the SAP Profile Parameter “Restart_Program” is used for the Message Server). Note that this test will check for the “Restart_Program” parameter in the instance profile. If not found, it will be inserted, and the instances will restart before performing the actual test.

# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test09.yml -v
……
TASK [sap.cluster_qa.test09 : Display test summary] ***************************************************************************
ok: [s4node01] => {
“msg”: [
“===============================================”,
” TEST09 SUMMARY”,
“===============================================”,
“Message Server killed: 6 times”,
“Initial ASCS location: s4node02”,
“Final ASCS location: s4node02”,
“Initial ERS location: s4node01”,
“Final ERS location: s4node01”,
“HA Action taken: ASCS Restart on same node”,
“ASCS/ERS separation maintained: YES”,
“===============================================”
]
}

PLAY RECAP *********************************************************************
s4node01 : ok=105 changed=1 unreachable=0 failed=0 skipped=77 rescued=0 ignored=0
s4node02 : ok=123 changed=6 unreachable=0 failed=0 skipped=22 rescued=0 ignored=0
s4node03 : ok=81 changed=0 unreachable=0 failed=0 skipped=64 rescued=0 ignored=0

Please note this test may take approximately 7 to 10 minutes since the message server is killed repeatedly up to 6 times or until HA software intervenes to perform a failover.

10. Test10: Recoverable outage of the Message Server is handled correctly (if the SAP Profile Parameter “Start_Program” is used for the message server). Note that this test will check for the “Start_Program” parameter in the instance profile. If not found, it will be inserted and the instances will restart before performing the actual test.

# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test10.yml -v
……

TASK [sap.cluster_qa.test10 : Display test summary] ***************************************************************************

ok: [s4hana17] => {

“msg”: [

“===============================================”,

” TEST10 SUMMARY”,

“===============================================”,

“Test: Irrecoverable Message Server outage”,

“Profile Parameter: Start_Program (not Restart_Program)”,

“Message Server killed: YES”,

“Initial ASCS location: s4hana19”,

“Final ASCS location: s4hana18”,

“Initial ERS location: s4hana17”,

“Final ERS location: s4hana17”,

“HA Action taken: ASCS Failover”,

“ASCS/ERS separation maintained: YES”,

“===============================================”

]

}

PLAY RECAP ***************************************************************************

s4hana17 : ok=136 changed=4 unreachable=0 failed=0 skipped=10 rescued=0 ignored=0

s4hana18 : ok=119 changed=0 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0

s4hana19 : ok=124 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 rescued=0 ignored=0

Refer to the README.md file of each test role for more details about the tests. The README.md for each test role can be found in the following location of the same directory: ansible_collections/sap/cluster_qa/roles/<test-name>/README.md.

Authors: Amir Memon

IntroductionTesting of High Availability Clusters used for managing SAP S/4HANA application servers is critical for ensuring their reliability and resilience. Many issues can be avoided in the first place during the pre-go-live stage by carrying out certain tests.While manual verification of an HA cluster setup is possible, these tests can also be automated using Ansible. Developing custom roles and playbooks allows for a modular, reusable, and scalable design to automate each test case. This approach is particularly effective for tests like simulating a node failure to verify failover and resource migration.Manual testing becomes tedious, time-consuming, and error-prone, especially in HA cluster environments where multiple nodes must be constantly observed during every test.What is the Automated Test suite?This Automated Test suite boosts quality assurance efficiency, test coverage, and accelerates bug reproduction/verification and post-maintenance, leading to faster production-ready RHEL HA cluster environments for ENSA 2.The Automated Test suite requires a dedicated, out-of-cluster RHEL Ansible Control node to execute and monitor tests—including failovers and node crashes—on the SAP HA cluster. Using Ansible makes the tests transparent and easy to understand, and its built-in plugins allow saving logs for future review and audits.Which tests are performed?This automation test suite also covers the tests for the SAP HA Interface for SAP application servers. Each test carries out specific tasks to verify the cluster configuration and current state, conducts the actual test, verifies the output, and compares it with the expected results. The following table outlines what the test suite actually verifies on the systems being tested:Test NameTest TitleMinimum no. of HA NodesTest01Name and version of HA software2Test02HA configuration showing no errors2Test03Shared Library (HA-Interface) loads without any errors2Test04Manual move of ASCS works correctly with lock data2Test05Irrecoverable outage of the Enqueue Server (ES) 2 is handled correctly2Test06Outage of the Enqueue Replicator (ER) 2 is handled correctly>2Test07ASCS moves correctly under load without specifying the destination node (Skipped in the current version)>2Test08ASCS moves correctly in case of hardware or OS failure (Note: Node crash execution)>2Test09Recoverable outage of the Message Server is handled correctly (if the SAP Profile Parameter “Restart_Program” is used for the Message Server) (Note: Backup and Auto-config)>2Test10Irrecoverable outage of the Message Server is handled correctly (if the SAP Profile Parameter “Start_Program” is used for the Message Server) (Note: Backup and Auto-config)>2Please note that test07 is skipped in the current version and may be implemented and updated in future releases. Additional test cases are planned for inclusion in future releases to enhance coverage.Prerequisites1. A minimum of 2 node cluster configuration with ASCS and ERS resources to start with the initial tests, but 3 nodes are recommended. Refer to the following document for the guidelines to configure the cluster that can be tested with the Ansible playbooks described in this blog: Configuring HA clusters to manage SAP NetWeaver or SAP S/4HANA Application server instances using the RHEL HA Add-On | Red Hat Enterprise Linux for SAP Solutions (Only ENSA2). Your pacemaker cluster should look like the following:[root@s4node01: ~]# pcs cluster statusCluster Status:…..Node List: * Online: [ s4node01 s4node02 s4node03 ]PCSD Status: s4node01: Online s4node03: Online s4node02: Online[root@s4node01: ~]# pcs resource status…… * Resource Group: s4h_ASCS20_group: * s4h_lvm_ascs20 (ocf:heartbeat:LVM-activate): Started s4node03 * s4h_fs_ascs20 (ocf:heartbeat:Filesystem): Started s4node03 * s4h_vip_ascs20 (ocf:heartbeat:IPaddr2): Started s4node03 * s4h_ascs20 (ocf:heartbeat:SAPInstance): Started s4node03 * Resource Group: s4h_ERS29_group: * s4h_lvm_ers29 (ocf:heartbeat:LVM-activate): Started s4node01 * s4h_fs_ers29 (ocf:heartbeat:Filesystem): Started s4node01 * s4h_vip_ers29 (ocf:heartbeat:IPaddr2): Started s4node01 * s4h_ers29 (ocf:heartbeat:SAPInstance): Started s4node01This is also a precondition before running any test.2. Ensure that the SAP HA Interface for SAP ABAP application server instances is configured as mentioned here: How to enable the SAP HA Interface for SAP ABAP application server instances managed by the RHEL HA Add-On? – Red Hat Customer Portal.3. Ensure that the sap.sap_operations collection for host_info and pcs_status_info module is installed on the ansible control node.Getting started1. Clone the community.sap_ha_cluster_qa repository and enter into that directory# git clone https://github.com/sap-linuxlab/community.sap_ha_cluster_qa.git# cd community.sap_ha_cluster_qa.git2. Verify that the ansible.cfg, inventory and playbook files match your specific Ansible environment3. Make sure that the inventory file contains at least the hostnames of all the reachable cluster nodes that you want to test:For example:# cat tests/inventory/x86_64.yml—all: children: s4hana-3n: hosts: s4node01: s4node02: s4node03: 4. Ensure that the nodes are reachable via ansible ping, for example:# ansible all -m ping -i tests/inventory/x86_64.yml s4node01 | SUCCESS => { “ansible_facts”: { “discovered_interpreter_python”: “/usr/bin/python3” }, “changed”: false, “ping”: “pong”}s4node02 | SUCCESS => { “ansible_facts”: { “discovered_interpreter_python”: “/usr/bin/python3” }, “changed”: false, “ping”: “pong”}s4node03 | SUCCESS => { “ansible_facts”: { “discovered_interpreter_python”: “/usr/bin/python3” }, “changed”: false, “ping”: “pong”} How to run the tests1. While in the community.sap_ha_cluster_qa directory run the playbook as follows for test01 to verify name and version of HA software.# ansible-playbook -i tests/inventory/x86_64.yml ansible_collections/sap/cluster_qa/playbooks/test01.yml -vPLAY [Playbook to run test01 test case on ASCS and ERS instances] ***************************************************************************TASK [Collect necessary gather_facts] ***************************************************************************ok: [s4node01]ok: [s4node03]ok: [s4node02]……TASK [sap.cluster_qa.test01 : Print test results completing test01 test case for current instance] ***************************************************************************ok: [s4node02] => { “msg”: { “changed”: false, “failed”: false, “ha_get_failoverconfig_info”: { “HAActive”: true, “HAActiveNode”: “s4node02”, “HADocumentation”: “https://github.com/ClusterLabs/sap_cluster_connector”, “HANodes”: “”, “HAProductVersion”: “Pacemaker”, “HASAPInterfaceVersion”: “sap_cluster_connector” } }}……TASK [sap.cluster_qa.test01 : Print test results completing test01 test case for current instance] ***************************************************************************ok: [s4node03] => { “msg”: { “changed”: false, “failed”: false, “ha_get_failoverconfig_info”: { “HAActive”: true, “HAActiveNode”: “s4node03”, “HADocumentation”: “https://github.com/ClusterLabs/sap_cluster_connector”, “HANodes”: “”, “HAProductVersion”: “Pacemaker”, “HASAPInterfaceVersion”: “sap_cluster_connector” } }}2. Similarly, you can run playbooks for the next test case by replacing the playbook. In this case test02.yml to verify HA configuration shows no errors# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test02.yml -v………TASK [sap.cluster_qa.test02 : Print the results completing TEST02 test case for ERS node] ***************************************************************************skipping: [s4node02] => {}skipping: [s4node03] => {}ok: [s4node01] => { “msg”: { “changed”: false, “failed”: false, “ha_check_config_info”: [ { “category”: “SAPControl-SAP-CONFIGURATION”, “comment”: “0 ABAP instances detected”, “description”: “Redundant ABAP instance configuration”, “state”: “SAPControl-HA-SUCCESS” }, { “category”: “SAPControl-SAP-CONFIGURATION”, “comment”: “All Enqueue server separated from application server”, “description”: “Enqueue separation”, “state”: “SAPControl-HA-SUCCESS” }, { “category”: “SAPControl-SAP-CONFIGURATION”, “comment”: “All MessageServer separated from application server”, “description”: “MessageServer separation”, “state”: “SAPControl-HA-SUCCESS” }, { “category”: “SAPControl-SAP-STATE”, “comment”: “SCS instance status ok”, “description”: “SCS instance running”, “state”: “SAPControl-HA-SUCCESS” }, { “category”: “SAPControl-SAP-CONFIGURATION”, “comment”: “SAPInstance includes is-ers patch”, “description”: “SAPInstance RA sufficient version (s4ascs_S4H_20)”, “state”: “SAPControl-HA-SUCCESS” }, { “category”: “SAPControl-SAP-CONFIGURATION”, “comment”: “Enqueue replication enabled”, “description”: “Enqueue replication (s4ascs_S4H_20)”, “state”: “SAPControl-HA-SUCCESS” }, { “category”: “SAPControl-SAP-STATE”, “comment”: “Enqueue replication active”, “description”: “Enqueue replication state (s4ascs_S4H_20)”, “state”: “SAPControl-HA-SUCCESS” }, { “category”: “SAPControl-SAP-CONFIGURATION”, “comment”: “SAPInstance includes is-ers patch”, “description”: “SAPInstance RA sufficient version (s4ers_S4H_29)”, “state”: “SAPControl-HA-SUCCESS” } ] }}PLAY RECAP ***************************************************************************s4node01 : ok=32 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 s4node02 : ok=30 changed=0 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0 s4node03 : ok=32 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 3. Similarly, you can run the tests for each test case by replacing the command with the corresponding test03 playbook as shown below. Test03 verifies the shared Library (HA-Interface) is being loaded without any errors. # ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test03.yml -v…….TASK [sap.cluster_qa.test03 : Printing SAP HA trace logs for ERS instance] ***************************************************************************ok: [s4node01] => { “msg”: [ “SAP HA Trace: Thu Jan 22 15:59:09 2026”, “SAP HA Trace: — SAP_HA_FindSAPInstance Exit-Code: SAP_HA_OK —“, “SAP HA Trace: Thu Jan 22 15:59:09 2026”, “SAP HA Trace: === SAP_HA_StartCluster ===”, “SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector cpa …”, “SAP HA Trace: SAP_HA_StartCluster: FOUND PENDING ACTION -> SAP_HA_START_IN_PROGRESS”, “SAP HA Trace: Thu Jan 22 15:59:09 2026”, “SAP HA Trace: — SAP_HA_StartCluster Exit-Code: SAP_HA_START_IN_PROGRESS —“, “SAP HA Trace: Mon Jan 26 13:35:13 2026”, “SAP HA Trace: === SAP_HA_FindSAPInstance ===”, “SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector lsr …”, “SAP HA Trace: searchClusterFile: S4H:29 found”, “SAP HA Trace: Mon Jan 26 13:35:13 2026”, “SAP HA Trace: — SAP_HA_FindSAPInstance Exit-Code: SAP_HA_OK —“, “SAP HA Trace: Mon Jan 26 13:35:13 2026”, “SAP HA Trace: === SAP_HA_StopCluster ===”, “SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector cpa …”, “SAP HA Trace: SAP_HA_StopCluster: FOUND PENDING ACTION -> SAP_HA_STOP_IN_PROGRESS”, “SAP HA Trace: Mon Jan 26 13:35:13 2026”, “SAP HA Trace: — SAP_HA_StopCluster Exit-Code: SAP_HA_STOP_IN_PROGRESS —“, “SAP HA Trace: Mon Jan 26 13:35:17 2026”, “SAP HA Trace: === SAP_HA_FindSAPInstance ===”, “SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector lsr …”, “SAP HA Trace: searchClusterFile: S4H:29 found”, “SAP HA Trace: Mon Jan 26 13:35:17 2026”, “SAP HA Trace: — SAP_HA_FindSAPInstance Exit-Code: SAP_HA_OK —“, “SAP HA Trace: Mon Jan 26 13:35:17 2026”, “SAP HA Trace: === SAP_HA_StartCluster ===”, “SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector cpa …”, “SAP HA Trace: SAP_HA_StartCluster: FOUND PENDING ACTION -> SAP_HA_START_IN_PROGRESS”, “SAP HA Trace: Mon Jan 26 13:35:17 2026”, “SAP HA Trace: — SAP_HA_StartCluster Exit-Code: SAP_HA_START_IN_PROGRESS —“, “SAP HA Trace: Mon Jan 26 13:36:20 2026”, “SAP HA Trace: === SAP_HA_CheckConfig ===”, “SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector hcc …”, “SAP HA Trace: Mon Jan 26 13:36:20 2026”, “SAP HA Trace: — SAP_HA_CheckConfig Exit-Code: SAP_HA_OK —“, “SAP HA Trace: Mon Jan 26 13:36:20 2026”, “SAP HA Trace: === SAP_HA_FreeConfigCheck ===”, “SAP HA Trace: Mon Jan 26 13:36:20 2026”, “SAP HA Trace: — SAP_HA_FreeConfigCheck Exit-Code: SAP_HA_OK —“, “SAP HA Trace: Mon Jan 26 13:36:21 2026”, “SAP HA Trace: === SAP_HA_CheckConfig ===”, “SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector hcc …”, “SAP HA Trace: Mon Jan 26 13:36:21 2026”, “SAP HA Trace: — SAP_HA_CheckConfig Exit-Code: SAP_HA_OK —“, “SAP HA Trace: Mon Jan 26 13:36:21 2026”, “SAP HA Trace: === SAP_HA_FreeConfigCheck ===”, “SAP HA Trace: Mon Jan 26 13:36:21 2026”, “SAP HA Trace: — SAP_HA_FreeConfigCheck Exit-Code: SAP_HA_OK —” ]}skipping: [s4node02] => {}skipping: [s4node03] => {}PLAY RECAP ***************************************************************************s4node01 : ok=32 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 s4node02 : ok=30 changed=0 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0 s4node03 : ok=32 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 4. Test04 for verifying manual ASCS move:# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test04.yml -v……TASK [sap.cluster_qa.test04 : Asserting the locks of ASCS and ERS after move completing the TEST04 Test Case] ***************************************************************************ok: [s4node01] => { “changed”: false, “msg”: “All assertions passed”}ok: [s4node02] => { “changed”: false, “msg”: “All assertions passed”}ok: [s4node03] => { “changed”: false, “msg”: “All assertions passed”}PLAY RECAP ***************************************************************************s4node01 : ok=69 changed=1 unreachable=0 failed=0 skipped=8 rescued=0 ignored=0 s4node02 : ok=66 changed=0 unreachable=0 failed=0 skipped=10 rescued=0 ignored=0 s4node03 : ok=70 changed=2 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0 Lock table data is compared and asserted 3 times in this test run: ASCS and ERS before move, ASCS before and after move, ASCS and ERS after move. 5. Test05 for verifying irrecoverable outage of the Enqueue Server (ES) 2 is handled correctly.# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test05.yml -v……TASK [sap.cluster_qa.test05 : Verifying ASCS not on the same node] ***************************************************************************ok: [s4node01] => { “changed”: false, “msg”: “All assertions passed”}ok: [s4node02] => { “changed”: false, “msg”: “All assertions passed”}ok: [s4node03] => { “changed”: false, “msg”: “All assertions passed”}PLAY RECAP ***************************************************************************s4node01 : ok=38 changed=1 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 s4node02 : ok=38 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 s4node03 : ok=36 changed=1 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 6. Test06 for verifying outage of the Enqueue Replicator 2 is handled correctly# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test06.yml -v……TASK [sap.cluster_qa.test06 : Verifying ERS not on the same node as ASCS] ***************************************************************************ok: [s4node01] => { “changed”: false, “msg”: “ERS successfully moved to s4node01, different from ASCS node s4node02”}ok: [s4node02] => { “changed”: false, “msg”: “ERS successfully moved to s4node01, different from ASCS node s4node02”}ok: [s4node03] => { “changed”: false, “msg”: “ERS successfully moved to s4node01, different from ASCS node s4node02”}PLAY RECAP ***************************************************************************s4node01 : ok=53 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 s4node02 : ok=49 changed=1 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 s4node03 : ok=49 changed=1 unreachable=0 failed=0 skipped=2 rescued=0 ignored=07. Test07 (skipped in the current version)8. Test08 to verify ASCS moves correctly in case of hardware or OS failure. Please note that in this test the primary ASCS node will be crashed using the “echo c > /proc/sysrq-trigger” command.# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test08.yml -v……TASK [sap.cluster_qa.test08 : Verifying ASCS not on the same node] ***************************************************************************ok: [s4node01] => { “changed”: false, “msg”: “ASCS successfully moved from s4node03 to s4node02”}PLAY RECAP ***************************************************************************s4node01 : ok=40 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 s4node02 : ok=33 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 s4node03 : ok=18 changed=1 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0 9. Test09: Recoverable outage of the Message Server is handled correctly (if the SAP Profile Parameter “Restart_Program” is used for the Message Server). Note that this test will check for the “Restart_Program” parameter in the instance profile. If not found, it will be inserted, and the instances will restart before performing the actual test.# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test09.yml -v……TASK [sap.cluster_qa.test09 : Display test summary] ***************************************************************************ok: [s4node01] => { “msg”: [ “===============================================”, ” TEST09 SUMMARY”, “===============================================”, “Message Server killed: 6 times”, “Initial ASCS location: s4node02”, “Final ASCS location: s4node02”, “Initial ERS location: s4node01”, “Final ERS location: s4node01”, “HA Action taken: ASCS Restart on same node”, “ASCS/ERS separation maintained: YES”, “===============================================” ]}PLAY RECAP *********************************************************************s4node01 : ok=105 changed=1 unreachable=0 failed=0 skipped=77 rescued=0 ignored=0 s4node02 : ok=123 changed=6 unreachable=0 failed=0 skipped=22 rescued=0 ignored=0 s4node03 : ok=81 changed=0 unreachable=0 failed=0 skipped=64 rescued=0 ignored=0Please note this test may take approximately 7 to 10 minutes since the message server is killed repeatedly up to 6 times or until HA software intervenes to perform a failover.10. Test10: Recoverable outage of the Message Server is handled correctly (if the SAP Profile Parameter “Start_Program” is used for the message server). Note that this test will check for the “Start_Program” parameter in the instance profile. If not found, it will be inserted and the instances will restart before performing the actual test.# ansible-playbook -i tests/inventory/x86_64.yml ./ansible_collections/sap/cluster_qa/playbooks/test10.yml -v……TASK [sap.cluster_qa.test10 : Display test summary] ***************************************************************************ok: [s4hana17] => { “msg”: [ “===============================================”, ” TEST10 SUMMARY”, “===============================================”, “Test: Irrecoverable Message Server outage”, “Profile Parameter: Start_Program (not Restart_Program)”, “Message Server killed: YES”, “Initial ASCS location: s4hana19”, “Final ASCS location: s4hana18”, “Initial ERS location: s4hana17”, “Final ERS location: s4hana17”, “HA Action taken: ASCS Failover”, “ASCS/ERS separation maintained: YES”, “===============================================” ]}PLAY RECAP ***************************************************************************s4hana17 : ok=136 changed=4 unreachable=0 failed=0 skipped=10 rescued=0 ignored=0 s4hana18 : ok=119 changed=0 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0 s4hana19 : ok=124 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 rescued=0 ignored=0 Refer to the README.md file of each test role for more details about the tests. The README.md for each test role can be found in the following location of the same directory: ansible_collections/sap/cluster_qa/roles/<test-name>/README.md.Authors: Amir Memon Read More Technology Blog Posts by Members articles

#SAP

#SAPTechnologyblog