Phoenix favicon

Apache Phoenix

Features

High Availability

Active/active and active/standby Phoenix client wiring across two HBase clusters, with graceful failover that lets writes drain server-side before the peer is promoted.

Phoenix High Availability (HA) lets a JDBC client transparently target a pair of HBase clusters that mirror the same Phoenix schema, so an operator-driven or fault-driven failover never requires the application to restart, reconnect, or rewrite URLs.

Phoenix 5.3.1 adds graceful failover — an intermediate ACTIVE_TO_STANDBY role that lets writes drain server-side before the peer is promoted — plus support for HBase's MASTER and RPC connection registries (PHOENIX-7493, PHOENIX-7495, PHOENIX-7586).

Concepts

An HA group is a named tuple of two HBase clusters and an HA policy, shared by every client that participates. The current role of each cluster lives in a JSON record in ZooKeeper, replicated to both clusters' ZK ensembles and watched by the client; role changes are picked up automatically.

Cluster roles

RoleClients can connect?Meaning
ACTIVEyesCluster is serving live reads and writes.
STANDBYyesCluster is reachable but not the current primary. FAILOVER clients refuse to bind; PARALLEL clients still bind.
ACTIVE_TO_STANDBYyesTransitional state during graceful failover. FAILOVER connections are closed; PARALLEL clients still bind. Writes may be rejected (see Graceful failover below).
OFFLINEnoCluster is intentionally taken out of rotation.
UNKNOWNnoRole has not been initialized or the record could not be read.

HA policies

The policy is part of the HA-group record. Clients do not pick it — operators do, when provisioning the group.

  • FAILOVER — exactly one cluster (the ACTIVE) serves the connection at any moment. The client transparently re-binds on role change.
  • PARALLEL — every statement is issued to both clusters in parallel, with the faster result returned. Useful when both clusters carry identical data and you want to mask single-cluster tail latency.

Failover sub-policies (FAILOVER only)

Controls how a FAILOVER connection reacts when its bound cluster transitions away from ACTIVE:

phoenix.ha.failover.policyBehavior
explicit (default)Subsequent operations throw FailoverSQLException; the application calls failover() to rebind to the new ACTIVE.
activeConnection transparently rebinds to the new ACTIVE on the next statement, up to phoenix.ha.failover.count attempts (default 3).

JDBC URL

The HA URL is a bracketed pair of per-cluster endpoints separated by |, optionally followed by a principal:

jdbc:phoenix+zk:[zk1\:2181::/hbase|zk2\:2181::/hbase]:my_principal

The presence of | inside the URL is what triggers the HA code path. The two URLs inside the brackets are always ZooKeeper quorums — Phoenix uses them to read the HA-group record.

The per-cluster connection Phoenix opens underneath may use any of the supported HBase registries (ZK, MASTER, or RPC), based on what the operator configured in the HA-group record. RPC requires HBase 2.5+.

Connecting

Set the HA group name as a JDBC property and open a connection like any other:

Properties props = new Properties();
props.setProperty("phoenix.ha.group.name", "myGroup");
try (Connection conn = DriverManager.getConnection(
        "jdbc:phoenix+zk:[zk1\\:2181::/hbase|zk2\\:2181::/hbase]", props)) {
    // Normal JDBC usage.
}

The returned Connection honors the policy declared in the HA-group record.

Graceful failover

Graceful failover is a two-step demotion of the ACTIVE cluster:

  1. ACTIVE → ACTIVE_TO_STANDBY. The operator flips the source cluster into ACTIVE_TO_STANDBY. FAILOVER clients' wrapped connections to the demoting cluster are closed (subsequent statements raise FailoverSQLException, or rebind on the next statement under the active sub-policy). PARALLEL clients continue to operate against both clusters. On the server side, with phoenix.cluster.role.based.mutation.block.enabled=true, new mutations on the demoting cluster are rejected with MutationBlockedIOException so replication to the peer can drain. No cluster is ACTIVE during this step, so new FAILOVER connections cannot be opened until step 2.
  2. ACTIVE_TO_STANDBY → STANDBY (peer promoted to ACTIVE). Once replication has caught up, the operator demotes the source the rest of the way and promotes the peer. New FAILOVER connections (and active sub-policy retries pending from step 1) now bind to the new ACTIVE; explicit clients call failover() themselves.

Rolling back is supported: an ACTIVE → ACTIVE_TO_STANDBY → ACTIVE sequence restores the source to ACTIVE without further role transitions. PARALLEL clients remain operational throughout; FAILOVER clients reopen connections to the restored ACTIVE once it returns.

Configuration

All HA-related keys. All can be set in hbase-site.xml on the client and/or as JDBC connection properties.

Required

KeyNotes
phoenix.ha.group.nameName of the HA group — must match the operator-provisioned record.

ZooKeeper tuning (client side)

KeyDefault
phoenix.ha.zk.connection.timeout.ms4000
phoenix.ha.zk.session.timeout.ms4000
phoenix.ha.zk.retry.base.sleep.ms1000
phoenix.ha.zk.retry.max5
phoenix.ha.zk.retry.max.sleep.ms10000

Fallback to a single cluster

KeyDefault
phoenix.ha.fallback.enabledtrue — if the HA record cannot be read from either ZK, fall back to a single-cluster connection.
phoenix.ha.fallback.cluster(empty) — JDBC URL of the fallback cluster.

Failover behavior

KeyDefault
phoenix.ha.transition.timeout.ms300000 (5 min) — time the client gets to close connections during a role transition.
phoenix.ha.failover.policyexplicit — or active to auto-rebind.
phoenix.ha.failover.count3 — max auto-rebind attempts for the active sub-policy.
phoenix.ha.failover.timeout.ms10000 — wait timeout for a single failover operation.

Server-side write blocking

KeyDefault
phoenix.cluster.role.based.mutation.block.enabledfalse — set true in the source cluster's hbase-site.xml to reject writes while the cluster is in ACTIVE_TO_STANDBY. Readers are unaffected.

Operator workflow

A canonical graceful failover from cluster A → cluster B:

  1. Confirm phoenix.cluster.role.based.mutation.block.enabled=true is set on cluster A and that replication A → B is healthy.
  2. Set A → ACTIVE_TO_STANDBY, B remains STANDBY. Writes to A start being rejected. FAILOVER connections to A are closed; PARALLEL clients continue to operate against both clusters.
  3. Wait for A → B replication lag to reach zero.
  4. Promote: A → STANDBY, B → ACTIVE. active clients transparently rebind; explicit clients receive FailoverSQLException and call failover() themselves.

For an unplanned failover, skip step 2 and go directly to step 4. The ACTIVE_TO_STANDBY role is a graceful-failover convenience, not a correctness requirement.

See also

  • Metrics — client metrics are emitted by both wrapped and HA connections.
Edit on GitHub

On this page