Configure auto-failover

This section describes how to add auto-failover capabilities onto a Conjur cluster. This section assumes that you have completed all of the tasks described in Setup.

Prerequisites

  • In a Podman environment that uses Container Network Interface (CNI) networking, for each container in the cluster, you must add a network-scoped alias for the container and set the alias for all networks that the container joins. To do this, when you use the docker run command to start the container, you must add the --network-alias option if your network is DNS-enabled. If it is not enabled, you must also add the --add-host option to map the fully qualified domain name (FQDN) to the IP address in the /etc/hosts file. For full instructions, see Start the Conjur container.

  • Install the Conjur CLI to run against the Conjur cluster you set up. The Conjur CLI runs as an application directly on the machine. For more information about the Conjur CLI, see Set up the Conjur CLI.

Create and load cluster policy

You define the Leader and Standbys in a cluster policy and load the policy to the Leader.

 

Policy defines security rules and organizes entities such as users, hosts, and secrets in your database. Policy also establishes the rules for role-based access on resources. For more information, see Policy.

  1. Copy the following policy into a text file.

     
    ---
    - !policy
      id: conjur
      body:
        - !policy
            id: cluster/<my-cluster-name>
            annotations:
              ttl: <ttl-value>
            body:
            - !layer
            - &hosts
              - !host
                id: <leader_dns>
              - !host
                id: <standby1_dns>
              - !host
                id: <standby2_dns>
            - !grant
              role: !layer
              member: *hosts
  2. Change the following values: 

    Object

    Description

    my-cluster-name

    This is the last component in the second policy id in the policy above.

    ttl-value

    The value, in seconds, of the time-to-live (TTL) counter that is used for auto-failover.

    If unavailability of the Leader in a cluster is resolved before this value expires, cluster operation resumes without failover. An expired TTL counter indicates failure of the Leader, at which point an auto-failover event will occur.

    For more information, see Failure detection and promotion.

    Default: 300 seconds (5 minutes)

     

    If you need to change the TTL value after the nodes are enrolled, see Update cluster TTL.

    leader<n>.dns

    The &hosts label lists DNS names for the Leader and each of the Standbys in the failover cluster. If your planned auto-failover cluster has more than two Standbys, add them as well under the &hosts label.

    Do not change any other objects in the policy. For information, here is an explanation of all of the objects in the policy.

    Object

    Description

    id: conjur

    A policy with the id conjur is required. It is loaded directly under the root policy.

    id: cluster/<my-cluster-name>

    A policy with the id cluster/my-cluster-name is required, with my-cluster-name being the name of your choice. This policy is declared as a subpolicy under the conjur policy. The resulting (required) policy structure is: 

     
    root
      conjur
        cluster/my-cluster-name

    layer

    A layer is a collection of hosts. If unnamed, a layer assumes the name of the policy that defines it.

    &hosts

    The list of hosts identifies the Leader and all of the Standbys that you intend to add into the auto-failover cluster. The host ids must be the resolvable DNS names for each node.

    grant

    The grant adds all of the hosts to the layer.

  3. Save the policy as a .yml file.

  4. Load the policy into root:
     
    conjur policy load -b <policy-branch> -f <policy-file-name>.yml

    For example:

     
    conjur policy load -b root -f my-cluster-policy.yml

    Use the Load a policy REST API.

  5. Continue with Enroll Leader and Standbys into the auto-failover cluster.

Enroll Leader and Standbys into the auto-failover cluster

The Leader and each Standby must be explicitly enrolled into the auto-failover cluster.

  1. Enroll the Leader and each Standby into the auto-failover cluster by running the following command on each respective cluster node.

     

    The command for the Standbys is slightly different than the command for Leader. The command for enrolling Standbys includes an additional argument, the -m argument, to identify the Leader.

     
    # For Leader enrollment: 
    $ docker exec <leader-container-name> evoke cluster enroll -n <leader-dns> <my-cluster-name>
    # For Standby enrollment: 
    $ docker exec <standby-container-name> evoke cluster enroll -n <standby-dns> -m <leader-dns> <my-cluster-name>

    Variable

    Description

    leader/standby dns

    The DNS resolvable name of the cluster node you are enrolling.

    -m leader-dns

    The leader-dns is the DNS resolvable name of the Leader that the Standby is associated with.

    my-cluster-name

    The auto-failover cluster name as defined in the cluster policy. This value must match the last component of the policy ID (my-cluster-name) in the cluster policy.

    For example:

     
    # Enroll the Leader
    $ docker exec mycontainer evoke cluster enroll -n node1.example.com cluster-1
    # Enroll Standby1:
    $ docker exec mycontainer evoke cluster enroll -n node2.example.com -m node1.example.com cluster-1
    # Enroll Standby2:
    $ docker exec mycontainer evoke cluster enroll -n node3.example.com -m node1.example.com cluster-1
  2. Continue with Verify status of Leader and Standbys.

Verify status of Leader and Standbys

Run the following commands to verify the health and status of the Leader and each of the Standbys in the auto-failover cluster.

  1. On the Leader node, run the following command to check its status in the cluster:

     
    $ curl localhost/health

    The following response is normal and indicates that the node is ready to participate in auto-failover:

     
    {
      "ok": true,
      "status": "running",
      "message": "acting as master"
    } 
  2. On each Standby node, run the following command to check its status in the cluster:

     
    $ curl localhost/health

    The following response is normal and indicates that the node is ready to participate in auto-failover:

     
    {
      "ok": true,
      "status": "standing_by",
      "message": null
    } 

If a node fails to join the cluster because of some error, the message field in the cluster status response describes the reason for the failure.