PCS
As we have clusvcadm -U <resource_group> and clusvcadm -Z
> <resource_group>
> > to freeze and unfreeze resource in CMAN. Would really appreciate if
> > someone please give some pointers for freezing/unfreezing a resource in
> > Pacemaker (pcs) as well.
> >
> > Thanks,
> > Jaspal Singla
>
> Hi,
>
> The equivalent in pacemaker is "managed" and "unmanaged" resources.
>
> The usage depends on what tools you are using. For pcs, it's "pcs
> resource unmanage <resource_name>" to freeze, and "manage" to unfreeze.
> At a lower level, it's setting the is-managed meta-attribute of the
> resource.
>
> It's also possible to set the maintenance-mode cluster property to
> "freeze" all resources.
6.8. Move Resources Manually
There are always times when an administrator needs to override the cluster and force resources to move to a specific location. In this example, we will force the WebSite to move to pcmk-1.
We will use the pcs resource move command to create a temporary constraint with a score of INFINITY. While we could update our existing constraint, using move allows to easily get rid of the temporary constraint later. If desired, we could even give a lifetime for the constraint, so it would expire automatically — but we don’t that in this example.
[root@pcmk-1 ~]# pcs resource move WebSite pcmk-1
[root@pcmk-1 ~]# pcs constraint
Location Constraints:
Resource: WebSite
Enabled on: pcmk-1 (score:50)
Enabled on: pcmk-1 (score:INFINITY) (role: Started)
Ordering Constraints:
start ClusterIP then start WebSite (kind:Mandatory)
Colocation Constraints:
WebSite with ClusterIP (score:INFINITY)
Ticket Constraints:
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Mon Sep 10 17:28:55 2018
Last change: Mon Sep 10 17:28:27 2018 by root via crm_resource on pcmk-1
2 nodes configured
2 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
WebSite (ocf::heartbeat:apache): Started pcmk-1
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Once we’ve finished whatever activity required us to move the resources to pcmk-1 (in our case nothing), we can then allow the cluster to resume normal operation by removing the new constraint. Due to our first location constraint and our default stickiness, the resources will remain on pcmk-1.
We will use the pcs resource clear command, which removes all temporary constraints previously created by pcs resource move or pcs resource ban.
[root@pcmk-1 ~]# pcs resource clear WebSite
[root@pcmk-1 ~]# pcs constraint
Location Constraints:
Resource: WebSite
Enabled on: pcmk-1 (score:50)
Ordering Constraints:
start ClusterIP then start WebSite (kind:Mandatory)
Colocation Constraints:
WebSite with ClusterIP (score:INFINITY)
Ticket Constraints:
Note that the INFINITY location constraint is now gone. If we check the cluster status, we can also see that (as expected) the resources are still active on pcmk-1.
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Mon Sep 10 17:31:47 2018
Last change: Mon Sep 10 17:31:04 2018 by root via crm_resource on pcmk-1
2 nodes configured
2 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
WebSite (ocf::heartbeat:apache): Started pcmk-1
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
To remove the constraint with the score of 50, we would first get the constraint’s ID using pcs constraint --full, then remove it with pcs constraint remove and the ID. We won’t show those steps here, but feel free to try it on your own, with the help of the pcs man page if necessary.
=================================
Remove Constraints
1 – List the currently configured constraints using the pcs constraint list --full command.
pcs constraint list --full
Example Output
Location Constraints:
Resource: test_sg
Disabled on: server01-cpn (score:-INFINITY) (role: Started) (id:cli-ban-test_sg-on-server01-cpn)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:
2 – Note the id string and use that to remove the constraints use the pcs constraint remove id command.
pcs constraint remove cli-ban-test_sg-on-server01-cpn
3 – Verify the constraints have been removed using the pcs constraint list --full command.
pcs constraint list --full
Example Output
Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:
==========================================
pacemaker and pcs on Linux example, Fencing
STONITH is an acronym for Shoot-The-Other-Node-In-The-Head and it protects your data from being corrupted by rogue nodes or concurrent access.
For example if a node network interface is down, but it mounts the filesystem, thus, you can't just simply sart mouthe the filesystem on other nodes. Using STONITH, you can make sure the node is surely offline and safely let other node access the data.
STONITH also has a role to play in the event that a clustered service cannot be stopped. In this case, the cluster uses STONITH to force the whole node offline, thereby making it safe to start the service elsewhere.
In the following examples, I'll create 3 IBM RSA STONITH agents for nodeA, nodeB, and nodeC. So that each node has a fencing device for other nodes to bring it down when needed.
Available STONITH (Fencing) Agents
# pcs stonith list
fence_apc - Fence agent for APC over telnet/ssh
fence_apc_snmp - Fence agent for APC over SNMP
fence_bladecenter - Fence agent for IBM BladeCenter
...
fence_rsa - Fence agent for IBM RSA
...
You can also add a filter to the end of the command, for example:
# pcs stonith list rsa
fence_rsa - Fence agent for IBM RSA
In the following examples, all fencing devices will all use fence_rsa
Setup properties for STONITH
# pcs property set no-quorum-policy=ignore
# pcs property set stonith-enabled=true
# pcs property set stonith-action=poweroff # default is reboot
Note: Set the stonith action to off is not always good option, for this example case, the resource is filesystem. and the filesystem device has redundancy access. If the resource access related fault caused the node get fenced, we better leave the node off for further investigation, instead of rebooting it to fix the problem.
Creating a Fencing Device
#pcs stonith create stonith-rsa-nodeA fence_rsa action=off ipaddr="nodeA_rsa" login=<user> passwd=<pass> pcmk_host_list=nodeA secure=true
# pcs stonith show
stonith-rsa-nodeA (stonith:fence_rsa): Stopped
Displaying Fencing Devices
We repeat the same steps for nodeB and nodeC, then we have 3 fence devices. The stonith service will start itself.
# pcs stonith show
stonith-rsa-nodeA (stonith:fence_rsa): Started
stonith-rsa-nodeB (stonith:fence_rsa): Started
stonith-rsa-nodeC (stonith:fence_rsa): Started
Managing Nodes with Fence Devices
# pcs stonith fence nodeC
Node: nodeC fenced
# pcs stonith confirm nodeC
Node: nodeC confirmed fenced
By default, the fence action bring the node off then on. If you want to bring the node offline only, use option --off
Note: confirm command also can be directly used to fence a node off.
Modifying Fencing Devices
You may noticed that there are many options were used during the fence device creation. Actuall, all of them can be modified and updated.
pcs stonith update stonith_id [stonith_device_options]
Displaying Device-Specific Fencing Options
In case you want to know the fence agent options, you can use the following ways.
You can also find it's options by its command line mode:
# /usr/sbin/fence_rsa -h
More detail about how to check, debug fence device in command line, see Fence agent for ibm rsa.
Or, you can list the fence agent options by pcs
# pcs stonith describe fence_rsa
Stonith options for: fence_rsa
action (required): Fencing Action
ipaddr (required): IP Address or Hostname
login (required): Login Name
passwd: Login password or passphrase
passwd_script: Script to retrieve password
cmd_prompt: Force command prompt
secure: SSH connection
identity_file: Identity file for ssh
ipport: TCP port to use for connection with device
ssh_options: SSH options to use
verbose: Verbose mode
debug: Write debug information to given file
version: Display version information and exit
help: Display help and exit
power_timeout: Test X seconds for status change after ON/OFF
shell_timeout: Wait X seconds for cmd prompt after issuing command
login_timeout: Wait X seconds for cmd prompt after login
power_wait: Wait X seconds after issuing ON/OFF
delay: Wait X seconds before fencing is started
retry_on: Count of attempts to retry power on
stonith-timeout: How long to wait for the STONITH action to complete per a stonith device.
priority: The priority of the stonith resource. Devices are tried in order of highest priority to lowest.
pcmk_host_map: A mapping of host names to ports numbers for devices that do not support host names.
pcmk_host_list: A list of machines controlled by this device (Optional unless pcmk_host_check=static-list).
pcmk_host_check: How to determine which machines are controlled by the device.
Deleting Fencing Devices
pcs stonith delete stonith_id
Configuring Fencing Levels
pcs stonith level add level node devices
Additional Fencing Configuration Options
Field Type Default Description
pcmk_host_argument string port An alternate parameter to supply instead of port. Some devices do not support the standard port parameter or may provide additional ones. Use this to specify an alternate, device-specific, parameter that should indicate the machine to be fenced. A value of none can be used to tell the cluster not to supply any additional parameters.
pcmk_reboot_action string reboot An alternate command to run instead of reboot. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the reboot action.
pcmk_reboot_timeout time 60s Specify an alternate timeout to use for reboot actions instead of stonith-timeout. Some devices need much more/less time to complete than normal. Use this to specify an alternate, device-specific, timeout for reboot actions.
pcmk_reboot_retries integer 2 The maximum number of times to retry the reboot command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries reboot actions before giving up.
pcmk_off_action string off An alternate command to run instead of off. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the off action.
pcmk_off_timeout time 60s Specify an alternate timeout to use for off actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for off actions.
pcmk_off_retries integer 2 The maximum number of times to retry the off command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries off actions before giving up.
pcmk_list_action string list An alternate command to run instead of list. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the list action.
pcmk_list_timeout time 60s Specify an alternate timeout to use for list actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for list actions.
pcmk_list_retries integer 2 The maximum number of times to retry the list command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries list actions before giving up.
pcmk_monitor_action string monitor An alternate command to run instead of monitor. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the monitor action.
pcmk_monitor_timeout time 60s Specify an alternate timeout to use for monitor actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for monitor actions.
pcmk_monitor_retries integer 2 The maximum number of times to retry the monitor command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries monitor actions before giving up.
pcmk_status_action string status An alternate command to run instead of status. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the status action.
pcmk_status_timeout time 60s Specify an alternate timeout to use for status actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for status actions.
pcmk_status_retries integer 2 The maximum number of times to retry the status command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries status actions before giving up.
=============================
PCS is the name of command line interface for RedHat Cluster, it allows you to manage Pacemaker (open-soure cluster resource manager) and Corosync (cluster membership and messaging layer). Please visit ClusterLabs.orgfor more information.
pcs stonith
One of the more interesting things (although very standard for cluster setups) in setting cluster up is STONITH. It's a fencing mechanism with an impressive number of mechanisms supplied by default.
Fencing takes care of monitoring your cluster closely with a view of fencing (blocking) a faulty node as soon as the node is confirmed to be malfunctioning.
Specifically, if there's a timeout or an error with starting a service, your pcs stonith setup may dictate that this condition is bad enough for the server (or VM) to be rebooted or completely shut down.
STONITH meaning
STONITH is actually an acronym, and it means Shoot The Other Node In The Head. Quite a violent name for a high availability feature, don't you think?
================================
Pacemaker Enable Maintenance Mode or Freeze Cluster
Enable Maintenance Mode
1 – Run the pcs property set maintenance-mode=true command to place the cluster into maintenance mode.
pcs property set maintenance-mode=true
2 – Next run the pcs property command to verity that it displays maintenance-mode: true which means the cluster is in maintenance mode.
pcs property
Example Output
Cluster Properties:
cluster-infrastructure: cman
dc-version: 1.1.15-5.el6-e174ec8
have-watchdog: false
last-lrm-refresh: 1527095308
maintenance-mode: true
no-quorum-policy: freeze
3 – Next run the pcs status --full command and you will see an alert at the top of the status output showing the cluster is in maintenance mode.
pcs status --full
Example Output
Cluster name: TEST_CLUSTER
Stack: cman
Current DC: server01-cpn (version 1.1.15-5.el6-e174ec8) - partition with quorum
Last updated: Fri Jun 1 09:25:24 2018 Last change: Fri Jun 1 09:20:51 2018 by root via cibadmin on server01-cpn
*** Resource management is DISABLED ***
The cluster will not attempt to start, stop or recover services
2 nodes and 44 resources configured
Disable Maintenance Mode
1 – Run the pcs property set maintenance-mode=false command to take the cluster out of maintenance mode.
pcs property set maintenance-mode=false
2 – Next run the pcs property command to verity that it does not display maintenance-mode: true which means the cluster is not in maintenance mode.
pcs property
Example Output
Cluster Properties:
cluster-infrastructure: cman
dc-version: 1.1.15-5.el6-e174ec8
have-watchdog: false
last-lrm-refresh: 1527095308
no-quorum-policy: freeze
=======================
Last updated
Was this helpful?