PCS

As we have clusvcadm -U <resource_group> and clusvcadm -Z

> <resource_group>

> > to freeze and unfreeze resource in CMAN. Would really appreciate if

> > someone please give some pointers for freezing/unfreezing a resource in

> > Pacemaker (pcs) as well.

> >

> > Thanks,

> > Jaspal Singla

>

> Hi,

>

> The equivalent in pacemaker is "managed" and "unmanaged" resources.

>

> The usage depends on what tools you are using. For pcs, it's "pcs

> resource unmanage <resource_name>" to freeze, and "manage" to unfreeze.

> At a lower level, it's setting the is-managed meta-attribute of the

> resource.

>

> It's also possible to set the maintenance-mode cluster property to

> "freeze" all resources.

6.8. Move Resources Manually

There are always times when an administrator needs to override the cluster and force resources to move to a specific location. In this example, we will force the WebSite to move to pcmk-1.

We will use the pcs resource move command to create a temporary constraint with a score of INFINITY. While we could update our existing constraint, using move allows to easily get rid of the temporary constraint later. If desired, we could even give a lifetime for the constraint, so it would expire automatically — but we don’t that in this example.

[root@pcmk-1 ~]# pcs resource move WebSite pcmk-1

[root@pcmk-1 ~]# pcs constraint

Location Constraints:

Resource: WebSite

Enabled on: pcmk-1 (score:50)

Enabled on: pcmk-1 (score:INFINITY) (role: Started)

Ordering Constraints:

start ClusterIP then start WebSite (kind:Mandatory)

Colocation Constraints:

WebSite with ClusterIP (score:INFINITY)

Ticket Constraints:

[root@pcmk-1 ~]# pcs status

Cluster name: mycluster

Stack: corosync

Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum

Last updated: Mon Sep 10 17:28:55 2018

Last change: Mon Sep 10 17:28:27 2018 by root via crm_resource on pcmk-1

2 nodes configured

2 resources configured

Online: [ pcmk-1 pcmk-2 ]

Full list of resources:

ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1

WebSite (ocf::heartbeat:apache): Started pcmk-1

Daemon Status:

corosync: active/disabled

pacemaker: active/disabled

pcsd: active/enabled

Once we’ve finished whatever activity required us to move the resources to pcmk-1 (in our case nothing), we can then allow the cluster to resume normal operation by removing the new constraint. Due to our first location constraint and our default stickiness, the resources will remain on pcmk-1.

We will use the pcs resource clear command, which removes all temporary constraints previously created by pcs resource move or pcs resource ban.

[root@pcmk-1 ~]# pcs resource clear WebSite

[root@pcmk-1 ~]# pcs constraint

Location Constraints:

Resource: WebSite

Enabled on: pcmk-1 (score:50)

Ordering Constraints:

start ClusterIP then start WebSite (kind:Mandatory)

Colocation Constraints:

WebSite with ClusterIP (score:INFINITY)

Ticket Constraints:

Note that the INFINITY location constraint is now gone. If we check the cluster status, we can also see that (as expected) the resources are still active on pcmk-1.

[root@pcmk-1 ~]# pcs status

Cluster name: mycluster

Stack: corosync

Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum

Last updated: Mon Sep 10 17:31:47 2018

Last change: Mon Sep 10 17:31:04 2018 by root via crm_resource on pcmk-1

2 nodes configured

2 resources configured

Online: [ pcmk-1 pcmk-2 ]

Full list of resources:

ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1

WebSite (ocf::heartbeat:apache): Started pcmk-1

Daemon Status:

corosync: active/disabled

pacemaker: active/disabled

pcsd: active/enabled

To remove the constraint with the score of 50, we would first get the constraint’s ID using pcs constraint --full, then remove it with pcs constraint remove and the ID. We won’t show those steps here, but feel free to try it on your own, with the help of the pcs man page if necessary.

=================================

Remove Constraints

1 – List the currently configured constraints using the pcs constraint list --full command.

pcs constraint list --full

Example Output

Location Constraints:

Resource: test_sg

Disabled on: server01-cpn (score:-INFINITY) (role: Started) (id:cli-ban-test_sg-on-server01-cpn)

Ordering Constraints:

Colocation Constraints:

Ticket Constraints:

2 – Note the id string and use that to remove the constraints use the pcs constraint remove id command.

pcs constraint remove cli-ban-test_sg-on-server01-cpn

3 – Verify the constraints have been removed using the pcs constraint list --full command.

pcs constraint list --full

Example Output

Location Constraints:

Ordering Constraints:

Colocation Constraints:

Ticket Constraints:

==========================================

pacemaker and pcs on Linux example, Fencing

STONITH is an acronym for Shoot-The-Other-Node-In-The-Head and it protects your data from being corrupted by rogue nodes or concurrent access.

For example if a node network interface is down, but it mounts the filesystem, thus, you can't just simply sart mouthe the filesystem on other nodes. Using STONITH, you can make sure the node is surely offline and safely let other node access the data.

STONITH also has a role to play in the event that a clustered service cannot be stopped. In this case, the cluster uses STONITH to force the whole node offline, thereby making it safe to start the service elsewhere.

In the following examples, I'll create 3 IBM RSA STONITH agents for nodeA, nodeB, and nodeC. So that each node has a fencing device for other nodes to bring it down when needed.

Available STONITH (Fencing) Agents

# pcs stonith list

fence_apc - Fence agent for APC over telnet/ssh

fence_apc_snmp - Fence agent for APC over SNMP

fence_bladecenter - Fence agent for IBM BladeCenter

...

fence_rsa - Fence agent for IBM RSA

...

You can also add a filter to the end of the command, for example:

# pcs stonith list rsa

fence_rsa - Fence agent for IBM RSA

In the following examples, all fencing devices will all use fence_rsa

Setup properties for STONITH

# pcs property set no-quorum-policy=ignore

# pcs property set stonith-enabled=true

# pcs property set stonith-action=poweroff # default is reboot

Note: Set the stonith action to off is not always good option, for this example case, the resource is filesystem. and the filesystem device has redundancy access. If the resource access related fault caused the node get fenced, we better leave the node off for further investigation, instead of rebooting it to fix the problem.

Creating a Fencing Device

#pcs stonith create stonith-rsa-nodeA fence_rsa action=off ipaddr="nodeA_rsa" login=<user> passwd=<pass> pcmk_host_list=nodeA secure=true

# pcs stonith show

stonith-rsa-nodeA (stonith:fence_rsa): Stopped

Displaying Fencing Devices

We repeat the same steps for nodeB and nodeC, then we have 3 fence devices. The stonith service will start itself.

# pcs stonith show

stonith-rsa-nodeA (stonith:fence_rsa): Started

stonith-rsa-nodeB (stonith:fence_rsa): Started

stonith-rsa-nodeC (stonith:fence_rsa): Started

Managing Nodes with Fence Devices

# pcs stonith fence nodeC

Node: nodeC fenced

# pcs stonith confirm nodeC

Node: nodeC confirmed fenced

By default, the fence action bring the node off then on. If you want to bring the node offline only, use option --off

Note: confirm command also can be directly used to fence a node off.

Modifying Fencing Devices

You may noticed that there are many options were used during the fence device creation. Actuall, all of them can be modified and updated.

pcs stonith update stonith_id [stonith_device_options]

Displaying Device-Specific Fencing Options

In case you want to know the fence agent options, you can use the following ways.

You can also find it's options by its command line mode:

# /usr/sbin/fence_rsa -h

More detail about how to check, debug fence device in command line, see Fence agent for ibm rsa.

Or, you can list the fence agent options by pcs

# pcs stonith describe fence_rsa

Stonith options for: fence_rsa

action (required): Fencing Action

ipaddr (required): IP Address or Hostname

login (required): Login Name

passwd: Login password or passphrase

passwd_script: Script to retrieve password

cmd_prompt: Force command prompt

secure: SSH connection

identity_file: Identity file for ssh

ipport: TCP port to use for connection with device

ssh_options: SSH options to use

verbose: Verbose mode

debug: Write debug information to given file

version: Display version information and exit

help: Display help and exit

power_timeout: Test X seconds for status change after ON/OFF

shell_timeout: Wait X seconds for cmd prompt after issuing command

login_timeout: Wait X seconds for cmd prompt after login

power_wait: Wait X seconds after issuing ON/OFF

delay: Wait X seconds before fencing is started

retry_on: Count of attempts to retry power on

stonith-timeout: How long to wait for the STONITH action to complete per a stonith device.

priority: The priority of the stonith resource. Devices are tried in order of highest priority to lowest.

pcmk_host_map: A mapping of host names to ports numbers for devices that do not support host names.

pcmk_host_list: A list of machines controlled by this device (Optional unless pcmk_host_check=static-list).

pcmk_host_check: How to determine which machines are controlled by the device.

Deleting Fencing Devices

pcs stonith delete stonith_id

Configuring Fencing Levels

pcs stonith level add level node devices

Additional Fencing Configuration Options

Field Type Default Description

pcmk_host_argument string port An alternate parameter to supply instead of port. Some devices do not support the standard port parameter or may provide additional ones. Use this to specify an alternate, device-specific, parameter that should indicate the machine to be fenced. A value of none can be used to tell the cluster not to supply any additional parameters.

pcmk_reboot_action string reboot An alternate command to run instead of reboot. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the reboot action.

pcmk_reboot_timeout time 60s Specify an alternate timeout to use for reboot actions instead of stonith-timeout. Some devices need much more/less time to complete than normal. Use this to specify an alternate, device-specific, timeout for reboot actions.

pcmk_reboot_retries integer 2 The maximum number of times to retry the reboot command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries reboot actions before giving up.

pcmk_off_action string off An alternate command to run instead of off. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the off action.

pcmk_off_timeout time 60s Specify an alternate timeout to use for off actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for off actions.

pcmk_off_retries integer 2 The maximum number of times to retry the off command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries off actions before giving up.

pcmk_list_action string list An alternate command to run instead of list. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the list action.

pcmk_list_timeout time 60s Specify an alternate timeout to use for list actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for list actions.

pcmk_list_retries integer 2 The maximum number of times to retry the list command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries list actions before giving up.

pcmk_monitor_action string monitor An alternate command to run instead of monitor. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the monitor action.

pcmk_monitor_timeout time 60s Specify an alternate timeout to use for monitor actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for monitor actions.

pcmk_monitor_retries integer 2 The maximum number of times to retry the monitor command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries monitor actions before giving up.

pcmk_status_action string status An alternate command to run instead of status. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the status action.

pcmk_status_timeout time 60s Specify an alternate timeout to use for status actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for status actions.

pcmk_status_retries integer 2 The maximum number of times to retry the status command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries status actions before giving up.

=============================

PCS is the name of command line interface for RedHat Cluster, it allows you to manage Pacemaker (open-soure cluster resource manager) and Corosync (cluster membership and messaging layer). Please visit ClusterLabs.orgfor more information.

pcs stonith

One of the more interesting things (although very standard for cluster setups) in setting cluster up is STONITH. It's a fencing mechanism with an impressive number of mechanisms supplied by default.

Fencing takes care of monitoring your cluster closely with a view of fencing (blocking) a faulty node as soon as the node is confirmed to be malfunctioning.

Specifically, if there's a timeout or an error with starting a service, your pcs stonith setup may dictate that this condition is bad enough for the server (or VM) to be rebooted or completely shut down.

STONITH meaning

STONITH is actually an acronym, and it means Shoot The Other Node In The Head. Quite a violent name for a high availability feature, don't you think?

================================

Pacemaker Enable Maintenance Mode or Freeze Cluster

Enable Maintenance Mode

1 – Run the pcs property set maintenance-mode=true command to place the cluster into maintenance mode.

pcs property set maintenance-mode=true

2 – Next run the pcs property command to verity that it displays maintenance-mode: true which means the cluster is in maintenance mode.

pcs property

Example Output

Cluster Properties:

cluster-infrastructure: cman

dc-version: 1.1.15-5.el6-e174ec8

have-watchdog: false

last-lrm-refresh: 1527095308

maintenance-mode: true

no-quorum-policy: freeze

3 – Next run the pcs status --full command and you will see an alert at the top of the status output showing the cluster is in maintenance mode.

pcs status --full

Example Output

Cluster name: TEST_CLUSTER

Stack: cman

Current DC: server01-cpn (version 1.1.15-5.el6-e174ec8) - partition with quorum

Last updated: Fri Jun 1 09:25:24 2018 Last change: Fri Jun 1 09:20:51 2018 by root via cibadmin on server01-cpn

​ *** Resource management is DISABLED ***

The cluster will not attempt to start, stop or recover services

2 nodes and 44 resources configured

Disable Maintenance Mode

1 – Run the pcs property set maintenance-mode=false command to take the cluster out of maintenance mode.

pcs property set maintenance-mode=false

2 – Next run the pcs property command to verity that it does not display maintenance-mode: true which means the cluster is not in maintenance mode.

pcs property

Example Output

Cluster Properties:

cluster-infrastructure: cman

dc-version: 1.1.15-5.el6-e174ec8

have-watchdog: false

last-lrm-refresh: 1527095308

no-quorum-policy: freeze

=======================

Last updated

Was this helpful?