Configure Auto-Failover

This section describes how to add auto-failover capabilities onto an HA cluster. This section assumes that you have completed all of the tasks described in Setup.

Create and load cluster policy

You define the Master and Standbys in a cluster policy and load the policy to the Master.

 

Policy defines security rules and organizes entities such as users, hosts, and secrets in your database. Policy also establishes the rules for role-based access on resources. For more information, see Policy Management.

  1. Prerequisite if you are using the CLI to define the cluster policy:

    Start a Conjur CLI as a container local to the Master.

    1. When starting the container, be sure to use the -v option to map a local file folder to the container. The local folder is where you create policy files.

    2. See Conjur CLI Setup for steps to download, start, authenticate, and use the CLI.

  2. Copy the following policy into a text file.

     
    ---
    - !policy
      id: conjur
      body:
        - !policy
            id: cluster/<my_cluster>
            annotations:
              ttl: 300
            body:
            - !layer
            - &hosts
              - !host
                id: <master1.dns>
              - !host
                id: <master2.dns>
              - !host
                id: <master3.dns>
            - !grant
              role: !layer
              member: *hosts
  3. Change the following values: 

    Object

    Description

    my-cluster-name

    This is the last component in the second policy id in the policy above.

    ttl-value

    The value, in seconds, of the time-to-live (TTL) counter that is used for auto-failover.

    If unavailability of the Master in a cluster is resolved before this value expires, cluster operation resumes without failover. An expired TTL counter indicates failure of the Master, at which point an auto-failover event will occur.

    For more information, see Failure detection and promotion.

    Default: 15 seconds

    Recommended: 300 seconds (i.e. 5 mins). Setting this longer value helps avoid any false failover events due to intermittent issues, such as temporary network interruption, and results in a more stable cluster.

     
    • Any change to the TTL value must be made before you enroll the nodes into the cluster.

    • Changing the TTL value after the nodes are enrolled requires a redeployment of the cluster for the changes to take effect.

    host<n> id

    The &hosts label lists DNS names for the Master and each of the Standbys in the fail-over cluster. If your planned auto-failover cluster has more than two Standbys, add them as well under the &hosts label.

    Do not change any other objects in the policy. For information, here is an explanation of all of the objects in the policy.

    Object

    Description

    id: conjur

    A policy with the id conjur is required. It is loaded directly under the root policy.

    id: cluster/<my-cluster-name>

    A policy with the id cluster/my-cluster-name is required, with my-cluster-name being the name of your choice. This policy is declared as a subpolicy under the conjur policy. The resulting (required) policy structure is: 

     
    root
      conjur
        cluster/my-cluster-name

    layer

    A layer is a collection of hosts. If unnamed, a layer assumes the name of the policy that defines it.

    &hosts

    The list of hosts identifies the master and all of the Standbys that you intend to add into the auto-failover cluster. The host ids must be the resolvable DNS names for each node.

    grant

    The grant adds all of the hosts to the layer.

  4. Save the policy as a .yml file
  5. Load this policy file under the Master root policy.

  6. Continue with Enroll Master and Standbys into the auto-failover cluster.

Enroll Master and Standbys into the auto-failover cluster

The Master and each Standby must be explicitly enrolled into the auto-failover cluster.

  1. Enroll the Master and each Standby into the auto-failover cluster by running the following command on each respective cluster node.

     

    The command for the Standbys is slightly different than the command for Master. The command for enrolling Standbys includes an additional argument, the -m argument, to identify the Master.

     
    # For Master enrollment: 
    $ docker exec <master-container-name> evoke cluster enroll -n <master-dns> <cluster-policy-name>
    # For Standby enrollment: 
    $ docker exec <standby-container-name> evoke cluster enroll -n <standby-dns> -m <master-dns> <cluster-policy-name>

     

    Variable

    Description

    master/standby dns

    The DNS resolvable name of the cluster node you are enrolling.

    -m master-dns

    The master-dns is the DNS resolvable name of the Master that the Standby is associated with.

    my-cluster-name

    The auto-failover cluster name as defined in the cluster policy. This value must match the last component of the policy ID (my-cluster-name) in the cluster policy.

    For example:

     
    # Enroll the Master
    $ docker exec dap evoke cluster enroll -n node1.example.com my-cluster-name
    # Enroll Standby1:
    $ docker exec dap evoke cluster enroll -n node2.example.com -m node1.example.com my-cluster-name
    # Enroll Standby2:
    $ docker exec dap evoke cluster enroll -n node3.example.com -m node1.example.com my-cluster_name
  2. Continue with Verify status of Master and Standbys.

Verify status of Master and Standbys

Run the following commands to verify the health and status of the Master and each of the Standbys in the auto-failover cluster.

  1. On each cluster node (Master and Standbys), run the following command to check its status in the cluster:

     
    $ curl localhost/health | jq .cluster

    The following response is normal and indicates that the node is ready to participate in auto-failover:

     
    {
      "ok": true,
      "status": "running",
      "message": "acting as master"
    } 

    If a node fails to join the cluster because of some error, the message field in the cluster status response describes the reason for the failure.

  2. Optional: Continue with Configure DAP Follower.