Conjur architecture and deployment reference

In this topic, you will learn the basics of the Conjur Enterprise deployment, hardware and software requirements, architecture reference, and best practices.

Definitions

Term

Description

Leader

The main Conjur node: A single Conjur Server instance that performs read/write operations. It is primarily used to update policies and secrets.

Standby

An inactive replica of the Leader. It gets promoted to a Leader if the original Leader fails.

Conjur Follower

A read only replica of the Leader. Followers allow secret reads at scale.

Followers are horizontally-scaling components that are typically configured behind a load balancer to handle all types of read requests, including authentication, permission checks, and secret fetches.

Conjur cluster

A group of Leader and Standbys

Auto-failover cluster

A subgroup of the Conjur cluster. Nodes share their states with one another and can automatically promote a Standby to become the Leader if the original Leader fails.

Manual failover cluster

A Conjur cluster where the Standbys can get promoted to a Leader manually only.

Disaster Recovery Standby (DR Standby)

A Standby in a site outside the auto-failover cluster. If this Standby needs to be promoted to Leader, this can only be done manually.

Synchronous Standby (Sync Standby)

When an operation is written to the Leader, the transaction is not completed until it is also performed on the synchronous Standby. This way, the synchronous Standby is always up to date with the Leader. There can be only one synchronous Standby.

Asynchronous Standby (Async Standby)

Asynchronous Standbys replicate from the Leader in an eventual consistency mode. Meaning that based on load, availability and size of changes, there might be a delay until the data finishes replicating.

Region

A physical location in a data center.

Availability zone (AZ)

A region can be divided into one or more availability zones. An AZ is one or more data centers with redundant power, networking, and connectivity.
Compared with a single data center, AZs enable you to operate production applications and databases that are better in terms of availability, fault tolerance, and scalability. All AZs in a region are expected to be interconnected with high-bandwidth, low-latency networking.

Hardware security module (HSM)

A computing device that safeguards and manages digital keys, performs encryption and decryption functions for digital signatures, strong authentication, and other cryptographic functions.

Key Management Service (KMS)

An AWS service that supports creation and management of cryptographic keys and the control of their usage.

Security Information and Event Management (SIEM)

A system that gives enterprise security professionals both insight into and a track record of the activities within their IT environment. A SIEM can collect and aggregate log data generated throughout the organization’s technology infrastructure, from host systems and applications to network and security devices.

Architecture overview

A high availability Conjur Enterprise deployment is configured in a Leader-Standby-Follower architecture. This deployment contains the following components:

  • One active Leader
  • At least two Standbys.
  • One or more Followers; we recommend at least two.

Replication

The Standbys continuously replicate the Conjur database from the active Leader using PostgreSQL streaming replication.

  • Synchronous replication ensures that there is always an up-to-date Standby database.
  • Asynchronous replication may lag behind the Leader; we recommend that you set one Standby as synchronous.
  • Followers are replications of the Leader, configured for application authentication and authorization, and for secrets retrieval. They are deployed close to target applications for low latency.
  • Leader-to-Follower replication is asynchronous. The Followers connect to the Leader through a load balancer. This avoids having to reconfigure the Followers whenever a Standby becomes the Leader.
 

Write operations cannot be made against a Follower; attempting a write operation against a Follower results in an error.

Conjur cluster

The Conjur cluster consists of the Leader and Standby nodes. You can set up a Conjur cluster to fail over automatically (auto-failover) or manually (manual failover).

The Leader and Standby nodes in an auto-failover cluster share their health state with each other using etcd.

The Leader is defined with a TTL (time to live) value. If the Leader becomes unavailable after this period, Conjur uses the Raft consensus algorithm to select a Standby that will be the new Leader. To avoid data loss, preference is given to the Standby whose database is most up-to-date.

 

The cluster should always contain an odd number of nodes - one Leader and an even number of Standbys. For example, there can be 1 Leader and 2 Standbys and any number of DR Standbys in an auto-failover cluster.

This flow diagram describes how auto-failover cluster is setup.

Fore more information, see Configure auto-failover.

Disaster recovery

In your DR site, use DR Standbys. These instances are not in the auto-failover cluster and have to be manually promoted to the Leader.

For more information, see Site disaster recovery walkthrough.

To perform the manual failover, you promote a Standby and rebase all other Standbys in the cluster to the new Leader.

DR Standbys and Followers do not need to be rebased because they rebase automatically through the Conjur cluster load balancer which finds the healthy Leader automatically for them.

This flow diagram describes how manual failover cluster is setup.

Disaster recovery

In a manual failover cluster, defining DR Standbys is optional as regular Standbys already act as DR Standbys and get promoted manually.

Conjur Follower

Followers should run in close proximity to the applications that they serve. You can run multiple Followers in the same environment, and for better scalability and availability, we recommend to put a load balancer before it, to route the traffic evenly.

Followers have the following characteristics:

  • Followers replicate from the Leader (on port 5432) and contain the same policies and secrets.
  • Followers write audit data and forward it to the Leader (on port 1999).
  • Followers are used to achieve high availability. Even if the Leader is temporarily unhealthy, Followers can continue to serve the clients and keep business going.
  • Followers communicate with the Leader through the cluster load balancer; they are always routed to the healthy Leader and automatically rebase if a Standby is promoted to Leader.

Best practices and recommendations

To optimize Conjur availability, we recommend the following:

  • Use multiple regions and multiple AZs. This ensures that a failing region or AZ does not affect Conjur availability.

  • The Leader and synchronous Standby should run in the same region but in different AZs. This ensures that at any given point, two Conjur nodes in different AZs are completely synchronized. They run in the same region to rely on low network latency, as every transaction on the Leader is not complete until it is also done on the synchronous Standby.

  • Asynchronous Standbys can be in another AZ. If another AZ is not available, we recommend running them in the same AZ as the synchronous Standby so that if the Leader AZ fails, there is a quorum to promote one of the other Standbys to Leader.

  • We recommend running asynchronous DR Standbys in a different region to the Conjur cluster. If a disaster has caused the main region to become unavailable, Conjur can be manually promoted from the DR site.

  • We recommend running the Followers as close as possible to the applications they serve. This helps ensure maximum availability and minimal latency for the requests.

Deployment options

The Conjur cluster (Leader and Standbys) can be deployed as follows:

Deployment type Description

As a container

Node runs as a single container.
Supported container: Docker and Podman.

Followers can be deployed as follows:

Deployment type Description

As a container

Node runs as a single container.

Supported container: Docker and Podman.

Kubernetes

Node runs as a Pod inside OpenShift/Kubernetes.

Accessibility

Leader and Standby

Port

Accessible from

Description

22

Local machine for setup / management

Required for SSH access

 

This specific port is not required by Conjur. You can choose an alternative port.

443

Load balancer

TLS endpoint for Conjur UI and API

444

Load balancer

HTTP health endpoint: simplifies load balancer setup

1999

Load balancer audit stream

Audit events are streamed from the Follower to the Leader (using syslog-ng)

5432

Load balancer, other Standby nodes

Required for data replication from the Leader to Standbys and Followers (PostgreSQL)

Follower

Port

Accessible from

Description

22

Local machine for setup / management

Optional for SSH access

 

This specific port is not required by Conjur. You can choose an alternative port.

443

Load balancer

TLS endpoint for Conjur UI and API

444

Load balancer

HTTP health endpoint ; simplifies load balancer setup

Communication between components

This section describes how the Conjur components communicate with each other.

Load balancer considerations

Conjur cluster load balancer

The Conjur cluster load balancer provides a well known network endpoint to forward requests to the Leader in a Conjur cluster. The load balancer constantly checks the health of the instances via the /health endpoint and, based on the result, can route traffic to the healthy Leader. The health check can be done via HTTPS through port 443 or HTTP through port 444.

 

The Conjur cluster load balancer must support the following capabilities:

  • Source IP address preservation for restriction and audit - The load balancer needs to preserve the source IP address of the incoming request.
  • Mutual TLS communication - Followers-Leader communication and Kubernetes authentication rely on Mutual TLS. Therefore, the load balancer should not perform TLS termination on its own but pass the connection through.
  • Health check port and protocol override - the Conjur cluster load balancer routes different protocols on different ports. Health is checked for all the routes by querying the /health HTTP endpoint.

Follower load balancer

The Follower load balancer is used to balance API requests over two or more Followers in the same location. The following additional capabilities depends on the load balancer you have:

  • Keeping the source IP address for IP address restriction and auditing - The load balancer must preserve the source IP address of the incoming request or be able to add an X-Forwarded-For header with the original source IP address of the request.

  • Mutual TLS communication - Follower-Leader communication and Kubernetes authentication rely on Mutual TLS. Therefore, the load balancer should not perform TLS termination on its own - pass the connection through.

 

The Followers load balancer may support the capabilities listed above, if the user requires it for the specified actions. 

Source IP address preservation

Non-transparent layer 4 and layer 7 proxies are supported to supply the correct client IP address if they are configured to meet the following requirements:

  • The first non-transparent proxy a client connects to is a layer 7 (HTTP) proxy.
  • All non-transparent proxies are included in the Conjur Trusted Proxies configuration.
  • All non-transparent proxies are configured to append the IP address of the request to the X-Forwarded-For HTTP header before forwarding the request.
  • Clients can ONLY connect to the first proxy and are unable to bypass it.

Security considerations

By default, the Conjur server keys are kept inside the Conjur node in cleartext. To improve the security of these keys at rest, we recommend encrypting the server keys with a master key. The master key encrypts/decrypts the server keys, and is intended to be given at run-time manually or automatically from a secure location such as HSM or AWS KMS. When a Conjur node starts, it can automatically access the master key in its protected store and use it to decrypt the server keys and start the Conjur services in a healthy state.

AWS KMS

The following image depicts the relationship between the Conjur nodes and the AWS instances, where AWS KMS is used to secure the master key.

HSM

The following image depicts the relationship between the Conjur nodes and the HSM instances, where an HSM is used to secure the master key.

For more information, see Server key encryption methods.

Audit

Conjur keeps audit information on actions that are performed in the system. The audit is written to three destinations:

File name Description

audit.json

Audits in JSON format; located in /var/log/conjur

audit.log

Audits in text format; located in /var/log/conjur

Audit DB

Provides easily accessible audit data for the Conjur UI

Audits are collected from the Followers and sent to the Leader. The Leader adds its own audit to these destinations. The audits that are collected in the Leader are all local and are not replicated to other Conjur nodes. Therefore, we recommend that you export them from the Leader to a centralized SIEM.