# PCS

**As we have clusvcadm -U \<resource\_group> and clusvcadm -Z**

**> \<resource\_group>**

**> >  to freeze and unfreeze resource in CMAN. Would really appreciate if**

**> > someone please give some pointers for freezing/unfreezing a resource in**

**> > Pacemaker (pcs) as well.**

**> >**

**> > Thanks,**

**> > Jaspal Singla**

**>**

**> Hi,**

**>**

**> The equivalent in pacemaker is "managed" and "unmanaged" resources.**

**>**

**> The usage depends on what tools you are using. For pcs, it's "pcs**

**> resource unmanage \<resource\_name>" to freeze, and "manage" to unfreeze.**

**> At a lower level, it's setting the is-managed meta-attribute of the**

**> resource.**

**>**

**> It's also possible to set the maintenance-mode cluster property to**

**> "freeze" all resources.**\
\
\
\
\
\ <br>

**6.8. Move Resources Manually**

**There are always times when an administrator needs to override the cluster and force resources to move to a specific location. In this example, we will force the WebSite to move to pcmk-1.**

**We will use the pcs resource move command to create a temporary constraint with a score of INFINITY. While we could update our existing constraint, using move allows to easily get rid of the temporary constraint later. If desired, we could even give a lifetime for the constraint, so it would expire automatically — but we don’t that in this example.**

**\[root\@pcmk-1 \~]# pcs resource move WebSite pcmk-1**

**\[root\@pcmk-1 \~]# pcs constraint**

**Location Constraints:**

&#x20; **Resource: WebSite**

&#x20;   **Enabled on: pcmk-1 (score:50)**

&#x20;   **Enabled on: pcmk-1 (score:INFINITY) (role: Started)**

**Ordering Constraints:**

&#x20; **start ClusterIP then start WebSite (kind:Mandatory)**

**Colocation Constraints:**

&#x20; **WebSite with ClusterIP (score:INFINITY)**

**Ticket Constraints:**

**\[root\@pcmk-1 \~]# pcs status**

**Cluster name: mycluster**

**Stack: corosync**

**Current DC: pcmk-2 (version 1.1.18-11.el7\_5.3-2b07d5c5a9) - partition with quorum**

**Last updated: Mon Sep 10 17:28:55 2018**

**Last change: Mon Sep 10 17:28:27 2018 by root via crm\_resource on pcmk-1**<br>

**2 nodes configured**

**2 resources configured**<br>

**Online: \[ pcmk-1 pcmk-2 ]**<br>

**Full list of resources:**<br>

&#x20;**ClusterIP      (ocf::heartbeat:IPaddr2):       Started pcmk-1**

&#x20;**WebSite        (ocf::heartbeat:apache):        Started pcmk-1**<br>

**Daemon Status:**

&#x20; **corosync: active/disabled**

&#x20; **pacemaker: active/disabled**

&#x20; **pcsd: active/enabled**

**Once we’ve finished whatever activity required us to move the resources to pcmk-1 (in our case nothing), we can then allow the cluster to resume normal operation by removing the new constraint. Due to our first location constraint and our default stickiness, the resources will remain on pcmk-1.**

**We will use the pcs resource clear command, which removes all temporary constraints previously created by pcs resource move or pcs resource ban.**

**\[root\@pcmk-1 \~]# pcs resource clear WebSite**

**\[root\@pcmk-1 \~]# pcs constraint**

**Location Constraints:**

&#x20; **Resource: WebSite**

&#x20;   **Enabled on: pcmk-1 (score:50)**

**Ordering Constraints:**

&#x20; **start ClusterIP then start WebSite (kind:Mandatory)**

**Colocation Constraints:**

&#x20; **WebSite with ClusterIP (score:INFINITY)**

**Ticket Constraints:**

**Note that the INFINITY location constraint is now gone. If we check the cluster status, we can also see that (as expected) the resources are still active on pcmk-1.**

**\[root\@pcmk-1 \~]# pcs status**

**Cluster name: mycluster**

**Stack: corosync**

**Current DC: pcmk-2 (version 1.1.18-11.el7\_5.3-2b07d5c5a9) - partition with quorum**

**Last updated: Mon Sep 10 17:31:47 2018**

**Last change: Mon Sep 10 17:31:04 2018 by root via crm\_resource on pcmk-1**<br>

**2 nodes configured**

**2 resources configured**<br>

**Online: \[ pcmk-1 pcmk-2 ]**<br>

**Full list of resources:**<br>

&#x20;**ClusterIP      (ocf::heartbeat:IPaddr2):       Started pcmk-1**

&#x20;**WebSite        (ocf::heartbeat:apache):        Started pcmk-1**<br>

**Daemon Status:**

&#x20; **corosync: active/disabled**

&#x20; **pacemaker: active/disabled**

&#x20; **pcsd: active/enabled**

**To remove the constraint with the score of 50, we would first get the constraint’s ID using pcs constraint --full, then remove it with pcs constraint remove and the ID. We won’t show those steps here, but feel free to try it on your own, with the help of the pcs man page if necessary.**\ <br>

**=================================**<br>

**Remove Constraints**

**1 – List the currently configured constraints using the pcs constraint list --full command.**<br>

**pcs constraint list --full**

**Example Output**<br>

**Location Constraints:**

&#x20; **Resource: test\_sg**

&#x20;   **Disabled on: server01-cpn (score:-INFINITY) (role: Started) (id:cli-ban-test\_sg-on-server01-cpn)**

**Ordering Constraints:**

**Colocation Constraints:**

**Ticket Constraints:**

**2 – Note the id string and use that to remove the constraints use the pcs constraint remove id command.**<br>

**pcs constraint remove cli-ban-test\_sg-on-server01-cpn**

**3 – Verify the constraints have been removed using the pcs constraint list --full command.**<br>

**pcs constraint list --full**

**Example Output**<br>

**Location Constraints:**

**Ordering Constraints:**

**Colocation Constraints:**

**Ticket Constraints:**\ <br>

**==========================================**\ <br>

**pacemaker and pcs on Linux example, Fencing**<br>

**STONITH is an acronym for Shoot-The-Other-Node-In-The-Head and it protects your data from being corrupted by rogue nodes or concurrent access.**<br>

**For example if a node network interface is down, but it mounts the filesystem, thus, you can't just simply sart mouthe the filesystem on other nodes. Using STONITH, you can make sure the node is surely offline and safely let other node access the data.**<br>

**STONITH also has a role to play in the event that a clustered service cannot be stopped. In this case, the cluster uses STONITH to force the whole node offline, thereby making it safe to start the service elsewhere.**<br>

**In the following examples, I'll create 3 IBM RSA STONITH agents for nodeA, nodeB, and nodeC. So that each node has a fencing device for other nodes to bring it down when needed.**<br>

**Available STONITH (Fencing) Agents**<br>

**# pcs stonith list**

**fence\_apc - Fence agent for APC over telnet/ssh**

**fence\_apc\_snmp - Fence agent for APC over SNMP**

**fence\_bladecenter - Fence agent for IBM BladeCenter**

**...**

**fence\_rsa - Fence agent for IBM RSA**

**...**<br>

**You can also add a filter to the end of the command, for example:**<br>

**# pcs stonith list rsa**

**fence\_rsa - Fence agent for IBM RSA**<br>

**In the following examples, all fencing devices will all use fence\_rsa**<br>

**Setup properties for STONITH**<br>

**# pcs property set no-quorum-policy=ignore**

**# pcs property set stonith-enabled=true**

**# pcs property set stonith-action=poweroff     # default is reboot**<br>

**Note: Set the stonith action to off is not always good option, for this example case, the resource is filesystem. and the filesystem device has redundancy access. If the resource access related fault caused the node get fenced, we better leave the node off for further investigation, instead of rebooting it to fix the problem.**<br>

**Creating a Fencing Device**<br>

**#pcs stonith create stonith-rsa-nodeA fence\_rsa action=off ipaddr="nodeA\_rsa" login=\<user> passwd=\<pass> pcmk\_host\_list=nodeA secure=true**

**# pcs stonith show**

&#x20;**stonith-rsa-nodeA    (stonith:fence\_rsa):    Stopped** <br>

**Displaying Fencing Devices**<br>

**We repeat the same steps for nodeB and nodeC, then we have 3 fence devices. The stonith service will start itself.**<br>

&#x20;**# pcs stonith show**

&#x20;**stonith-rsa-nodeA    (stonith:fence\_rsa):    Started**&#x20;

&#x20;**stonith-rsa-nodeB    (stonith:fence\_rsa):    Started**&#x20;

&#x20;**stonith-rsa-nodeC    (stonith:fence\_rsa):    Started** <br>

**Managing Nodes with Fence Devices**<br>

**# pcs stonith fence nodeC**   &#x20;

**Node: nodeC fenced**

**# pcs stonith confirm nodeC**

**Node: nodeC confirmed fenced**<br>

**By default, the fence action bring the node off then on. If you want to bring the node offline only, use option --off**

**Note: confirm command also can be directly used to fence a node off.**<br>

**Modifying Fencing Devices**<br>

**You may noticed that there are many options were used during the fence device creation. Actuall, all of them can be modified and updated.**<br>

**pcs stonith update stonith\_id \[stonith\_device\_options]**<br>

**Displaying Device-Specific Fencing Options**<br>

**In case you want to know the fence agent options, you can use the following ways.**<br>

**You can also find it's options by its command line mode:**<br>

**# /usr/sbin/fence\_rsa -h**<br>

**More detail about how to check, debug fence device in command line, see Fence agent for ibm rsa.**<br>

**Or, you can list the fence agent options by pcs**<br>

**# pcs stonith describe fence\_rsa**&#x20;

**Stonith options for: fence\_rsa**

&#x20; **action (required): Fencing Action**

&#x20; **ipaddr (required): IP Address or Hostname**

&#x20; **login (required): Login Name**

&#x20; **passwd: Login password or passphrase**

&#x20; **passwd\_script: Script to retrieve password**

&#x20; **cmd\_prompt: Force command prompt**

&#x20; **secure: SSH connection**

&#x20; **identity\_file: Identity file for ssh**

&#x20; **ipport: TCP port to use for connection with device**

&#x20; **ssh\_options: SSH options to use**

&#x20; **verbose: Verbose mode**

&#x20; **debug: Write debug information to given file**

&#x20; **version: Display version information and exit**

&#x20; **help: Display help and exit**

&#x20; **power\_timeout: Test X seconds for status change after ON/OFF**

&#x20; **shell\_timeout: Wait X seconds for cmd prompt after issuing command**

&#x20; **login\_timeout: Wait X seconds for cmd prompt after login**

&#x20; **power\_wait: Wait X seconds after issuing ON/OFF**

&#x20; **delay: Wait X seconds before fencing is started**

&#x20; **retry\_on: Count of attempts to retry power on**

&#x20; **stonith-timeout: How long to wait for the STONITH action to complete per a stonith device.**

&#x20; **priority: The priority of the stonith resource. Devices are tried in order of highest priority to lowest.**

&#x20; **pcmk\_host\_map: A mapping of host names to ports numbers for devices that do not support host names.**

&#x20; **pcmk\_host\_list: A list of machines controlled by this device (Optional unless pcmk\_host\_check=static-list).**

&#x20; **pcmk\_host\_check: How to determine which machines are controlled by the device.**<br>

**Deleting Fencing Devices**<br>

**pcs stonith delete stonith\_id**<br>

**Configuring Fencing Levels**<br>

**pcs stonith level add level node devices**<br>

**Additional Fencing Configuration Options**<br>

**Field Type Default Description**

**pcmk\_host\_argument string port An alternate parameter to supply instead of port. Some devices do not support the standard port parameter or may provide additional ones. Use this to specify an alternate, device-specific, parameter that should indicate the machine to be fenced. A value of none can be used to tell the cluster not to supply any additional parameters.**<br>

**pcmk\_reboot\_action string reboot An alternate command to run instead of reboot. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the reboot action.**<br>

**pcmk\_reboot\_timeout time 60s Specify an alternate timeout to use for reboot actions instead of stonith-timeout. Some devices need much more/less time to complete than normal. Use this to specify an alternate, device-specific, timeout for reboot actions.**<br>

**pcmk\_reboot\_retries integer 2 The maximum number of times to retry the reboot command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries reboot actions before giving up.**<br>

**pcmk\_off\_action string off An alternate command to run instead of off. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the off action.**<br>

**pcmk\_off\_timeout time 60s Specify an alternate timeout to use for off actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for off actions.**<br>

**pcmk\_off\_retries integer 2 The maximum number of times to retry the off command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries off actions before giving up.**<br>

**pcmk\_list\_action string list An alternate command to run instead of list. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the list action.**<br>

**pcmk\_list\_timeout time 60s Specify an alternate timeout to use for list actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for list actions.**<br>

**pcmk\_list\_retries integer 2 The maximum number of times to retry the list command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries list actions before giving up.**<br>

**pcmk\_monitor\_action string monitor An alternate command to run instead of monitor. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the monitor action.**<br>

**pcmk\_monitor\_timeout time 60s Specify an alternate timeout to use for monitor actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for monitor actions.**<br>

**pcmk\_monitor\_retries integer 2 The maximum number of times to retry the monitor command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries monitor actions before giving up.**<br>

**pcmk\_status\_action string status An alternate command to run instead of status. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the status action.**<br>

**pcmk\_status\_timeout time 60s Specify an alternate timeout to use for status actions instead of stonith-timeout. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for status actions.**<br>

**pcmk\_status\_retries integer 2 The maximum number of times to retry the status command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries status actions before giving up.**

**=============================**\ <br>

**PCS is the name of command line interface for RedHat Cluster, it allows you to manage Pacemaker (open-soure cluster resource manager) and Corosync (cluster membership and messaging layer). Please visit ClusterLabs.orgfor more information.**<br>

**pcs stonith**

**One of the more interesting things (although very standard for cluster setups) in setting cluster up is STONITH. It's a fencing mechanism with an impressive number of mechanisms supplied by default.**<br>

**Fencing takes care of monitoring your cluster closely with a view of fencing (blocking) a faulty node as soon as the node is confirmed to be malfunctioning.**<br>

**Specifically, if there's a timeout or an error with starting a service, your pcs stonith setup may dictate that this condition is bad enough for the server (or VM) to be rebooted or completely shut down.**<br>

**STONITH meaning**

**STONITH is actually an acronym, and it means Shoot The Other Node In The Head. Quite a violent name for a high availability feature, don't you think?**<br>

**================================**<br>

**Pacemaker Enable Maintenance Mode or Freeze Cluster**<br>

**Enable Maintenance Mode**<br>

**1 – Run the pcs property set maintenance-mode=true command to place the cluster into maintenance mode.**<br>

**pcs property set maintenance-mode=true**<br>

**2 – Next run the pcs property command to verity that it displays maintenance-mode: true which means the cluster is in maintenance mode.**<br>

**pcs property**<br>

**Example Output**<br>

**Cluster Properties:**

&#x20;**cluster-infrastructure: cman**

&#x20;**dc-version: 1.1.15-5.el6-e174ec8**

&#x20;**have-watchdog: false**

&#x20;**last-lrm-refresh: 1527095308**

&#x20;**maintenance-mode: true**

&#x20;**no-quorum-policy: freeze**<br>

**3 – Next run the pcs status --full command and you will see an alert at the top of the status output showing the cluster is in maintenance mode.**<br>

**pcs status --full**<br>

**Example Output**<br>

**Cluster name: TEST\_CLUSTER**

**Stack: cman**

**Current DC: server01-cpn (version 1.1.15-5.el6-e174ec8) - partition with quorum**

**Last updated: Fri Jun  1 09:25:24 2018          Last change: Fri Jun  1 09:20:51 2018 by root via cibadmin on server01-cpn**

**​              \*\*\* Resource management is DISABLED \*\*\***

&#x20; **The cluster will not attempt to start, stop or recover services**

**2 nodes and 44 resources configured**<br>

**Disable Maintenance Mode**<br>

**1 – Run the pcs property set maintenance-mode=false command to take the cluster out of maintenance mode.**<br>

**pcs property set maintenance-mode=false**<br>

**2 – Next run the pcs property command to verity that it does not display maintenance-mode: true which means the cluster is not in maintenance mode.**<br>

**pcs property**<br>

**Example Output**<br>

**Cluster Properties:**

&#x20;**cluster-infrastructure: cman**

&#x20;**dc-version: 1.1.15-5.el6-e174ec8**

&#x20;**have-watchdog: false**

&#x20;**last-lrm-refresh: 1527095308**

&#x20;**no-quorum-policy: freeze**<br>

**=======================**\
\
\
\
\
\
\ <br>
