Phoenix favicon

Apache Phoenix

Features

Eventually Consistent Global Indexes

Global secondary indexes maintained asynchronously off the data-table write path — for write-heavy workloads that can tolerate a bounded staleness window on the index in exchange for higher write throughput.

An eventually consistent global index behaves like a regular Phoenix global index at the SQL level — same DDL, same INCLUDE, same query planning — but Phoenix maintains it asynchronously instead of on the synchronous write path. Writes commit faster on the data table, the index catches up shortly after, and reads from the index are read-repaired against the data table so they never return incorrect data. Introduced in Phoenix 5.3.1 (PHOENIX-7794).

When to use it

Pick eventually consistent over the default (strong) consistency when both are true:

  • Synchronous index maintenance is your write-path bottleneck — typically a data table fanning out to several large indexes per mutation.
  • A bounded staleness window on the index (seconds) is acceptable for the queries that hit it.

Stay with the default CONSISTENCY=STRONG for indexes that back read-your-write flows — e.g. "insert a row, then immediately query it via the new index" inside a single user request.

This is a property of global indexes only. CONSISTENCY=EVENTUAL on a LOCAL index parses but has no runtime effect.

Creating an EC index

CONSISTENCY=EVENTUAL is set in the trailing properties slot of CREATE INDEX:

CREATE INDEX my_idx
  ON my_table (v1)
  INCLUDE (v2)
  CONSISTENCY=EVENTUAL;

UNCOVERED and ASYNC compose normally:

CREATE UNCOVERED INDEX my_idx ON my_table (city, name) CONSISTENCY=EVENTUAL;
CREATE INDEX my_idx ON my_table (v1) INCLUDE (v2) ASYNC CONSISTENCY=EVENTUAL;

The default is CONSISTENCY=STRONG. Flip an existing index in either direction:

ALTER INDEX my_idx ON my_table CONSISTENCY=EVENTUAL;
ALTER INDEX my_idx ON my_table CONSISTENCY=STRONG;

The consistency mode is a per-index property — not a connection setting and not a query hint. Every reader of the index sees the same mode.

How it works

EC indexes are maintained by a per-region background consumer that reads a Change Data Capture stream on the data table and applies the resulting mutations to the index. The first EC index on a table provisions the stream automatically; subsequent EC indexes on the same table share it.

The consumer supports two strategies for turning a CDC event into an index mutation, with opposite write-vs-read IO tradeoffs:

  • Derive on consume (default). The CDC event carries a lightweight data-row-state marker; the consumer re-reads the data row at consume time to compute the index mutation. Cheap on the write path, one extra data-table read per event on the consume path. Relies on phoenix.max.lookback.age.seconds being long enough for the data table to retain the before image of every modified row until the consumer catches up.
  • Serialize on write. The index mutation is computed at write time and serialized into the CDC event itself; the consumer just replays it. More write IO (and optionally compressed), no extra read on consume. Useful when the consumer's data-table read is the bottleneck or max-lookback is tight.

Toggle with phoenix.index.cdc.mutation.serialize (see Tuning). For most workloads the default — derive on consume — is the right choice.

How reads behave

There is no query-side change — no hint, no new syntax. The planner picks an EC index exactly like a STRONG index. The visibility contract differs in two ways:

  • A row recently inserted on the data table may not yet appear in the index.
  • An existing index row's covered column values may be stale until the next update is applied.

Phoenix never returns incorrect rows: any index row not yet caught up is verified against the data table before being returned, exactly like a STRONG index. The practical visibility window is a few seconds on a healthy cluster, governed by the tunables below.

Tuning

Set on the RegionServer side in hbase-site.xml. The defaults are sensible for most clusters; the two knobs you will typically reach for are batch size (throughput) and timestamp buffer (visibility delay floor).

PropertyDefaultDescription
phoenix.index.cdc.consumer.enabledtrueMaster switch for the async index maintenance subsystem. Disable to halt all EC-index maintenance cluster-wide.
phoenix.index.cdc.consumer.batch.size500Events drained per iteration. Larger amortizes overhead; smaller bounds staleness on bursty workloads.
phoenix.index.cdc.consumer.poll.interval.ms1000Sleep when there is no work to do. Raise to reduce idle wake-ups.
phoenix.index.cdc.consumer.timestamp.buffer.ms5000Safety buffer subtracted from "now" before consuming. Floor for visibility delay.
phoenix.index.cdc.consumer.startup.delay.ms10000Delay before a freshly opened region starts consuming.
phoenix.index.cdc.consumer.max.data.visibility.retries10Retries when the data row is not yet visible to the consumer.
phoenix.index.cdc.consumer.retry.pause.ms2000Sleep between data-visibility retries.
phoenix.index.cdc.mutation.serializefalseSelects the consumer strategy (see How it works). false derives index mutations at consume time (lower write IO); true serializes them at write time (no consume-side read).
phoenix.index.cdc.mutations.compress.enabledfalseSnappy-compress the serialized index mutation. Only relevant when phoenix.index.cdc.mutation.serialize=true.

With defaults, expect end-to-end index visibility of ~5–10 seconds.

Limitations

  • Global indexes only. LOCAL INDEX ... CONSISTENCY=EVENTUAL is a no-op.
  • Designed and tested for non-transactional tables. Combining EC indexes with transactional tables is undefined in 5.3.1.
  • Salted data tables are not supported — the underlying change stream Phoenix needs to maintain an EC index is incompatible with salting.
  • Data-table max lookback age must outlive the async lag. Increase phoenix.max.lookback.age.seconds on the data table if you tune the consumer for low throughput or you expect long catch-up periods.

Operational notes

  • The first EC index on a table provisions the underlying change stream automatically. Subsequent EC indexes on the same table reuse it.
  • Region splits and merges are handled transparently — no operator action is required to keep the index up to date through topology changes.
  • ALTER INDEX ... CONSISTENCY=STRONG returns the index to synchronous maintenance for new writes. Any in-flight async work continues to flow through until the queue drains.

See also

Edit on GitHub

On this page