New Features

5.3.1

Eventually Consistent Global Indexes. Relaxes the strong-consistency contract on global indexes for write-heavy workloads that can tolerate brief index/data divergence in exchange for higher write throughput and reduced write amplification (PHOENIX-7794 ).
Multi-row UPSERT ... VALUES. Standard SQL multi-row value constructors are now supported in a single UPSERT statement, eliminating per-row round-trips for client-side bulk inserts (PHOENIX-7198 ).
ROW_SIZE() SQL function. Returns the serialized byte size of a row — useful for hot-row diagnosis and capacity planning directly from SQL (PHOENIX-7705 ).
Graceful Failover for Phoenix HA. Coordinated active-to-standby role transitions in the Failover HA policy so in-flight clients drain cleanly during planned failover instead of being hard-killed (PHOENIX-7493 ).
Improved Scan Metrics. Phoenix now surfaces HBase's per-scan latency metrics end-to-end and adds top-N slowest parallel scan reporting, giving operators per-query visibility into scan tail-latency without resorting to RegionServer-level metrics (PHOENIX-7704 , PHOENIX-7729 ).
PhoenixSyncTable data-validation tool. New MapReduce-based tool that compares row data between a source and a target cluster for the same Phoenix table — useful for verifying replication, snapshot-based migrations, and DR drills (PHOENIX-7751 ).

5.3.0

Change Data Capture. Stream row-level changes as ordered, partitioned events with configurable pre/post/change scopes. Includes TTL-delete events and partition tracking that survives region splits, merges, and table drops (PHOENIX-7001 ).
Document Data: BSON. First-class Binary JSON column type with BSON_VALUE, BSON_CONDITION_EXPRESSION, and BSON_UPDATE_EXPRESSION functions for projecting, filtering, and atomically mutating individual document fields server-side (PHOENIX-7330 ).
VARBINARY_ENCODED. New variable-length binary type that sorts correctly in any position of a composite primary key, in row value constructors, and in secondary index keys (PHOENIX-7357 ).
View TTL. Each view over a shared base table can age data out on its own retention schedule, with read-time filtering and physical removal during Phoenix compaction (PHOENIX-6978 ).
Conditional TTL. Express row expiration as a SQL boolean expression evaluated against the row's own column values (PHOENIX-7170 ).
Strict vs Relaxed TTL. New IS_STRICT_TTL table property applicable to any kind of TTL — relaxed TTL keeps expired rows visible until major compaction (DynamoDB-style) and gives a cheaper write path; strict (default) hides expired rows on every read (PHOENIX-7667 ).
Segment Scan. TOTAL_SEGMENTS() SQL function returns N contiguous key ranges for a table, enabling clients to drive parallel full-table scans (DynamoDB-style) (PHOENIX-7684 ).
UPSERT/DELETE RETURNING *. Single-row UPSERT and DELETE can append RETURNING * to return the post-mutation row as a ResultSet in the same round-trip (PHOENIX-7651 ).
ON DUPLICATE KEY UPDATE_ONLY. New atomic-upsert variant that runs the update clause if the row exists and is a no-op otherwise — the supplied VALUES are never inserted (PHOENIX-7648 ).
Phoenix-DynamoDB REST Service. DynamoDB-compatible REST front-end for Phoenix, distributed via the apache/phoenix-adapters repository.

Earlier releases

Table Sampling. Support the TABLESAMPLE clause by implementing a filter that uses the guideposts established by stats gathering to only return a percentage of the rows. Available in our 4.12 release
Reduce on disk storage. Reduce on disk storage to improve performance by a) packing all values into a single cell per column family and b) provide an indirection between the column name and the column qualifier. Available in our 4.10 release
Atomic update. Atomic update is now possible in the UPSERT VALUES statement in support of counters and other use cases. Available in our 4.9 release
DEFAULT declaration. When defining a column it is now possible to provide a DEFAULT declaration for the initial value. Available in our 4.9 release
Namespace Mapping . Maps Phoenix schema to HBase namespace to improve isolation between different schemas. Available in our 4.8 release
Hive Integration . Enables Hive to be used with Phoenix in support of joining huge tables to other huge tables. Available in our 4.8 release
Local Index Improvements . Reworked local index implementation to guarantee colocation of table and index data and use supported HBase APIs for better maintainability. Available in our 4.8 release
DISTINCT Query Optimization . Push seek logic to server for SELECT DISTINCT and COUNT DISTINCT queries over the leading parts of the primary key leading to dramically better performance. Available in our 4.8 release
Transaction Support. Supports transactions by integrating with Tephra . Available in our 4.7 release
Time series Optimization. Optimizes queries against time series data as explained in more detail here . Available in our 4.6 release
Asynchronous Index Population. Enables an index to be created asynchronously using a map reduce job. Available in our 4.5 release
User Defined Functions. Allows users to create and deploy their own custom or domain-specific user-defined functions to the cluster. Available in our 4.4 release
Functional Indexes. Enables an index to be defined as expressions as opposed to just column names and have the index be used when a query contains this expression. Available in our 4.3 release
Map-reduce Integration. Support general map-reduce integration to Phoenix by implementing custom input and output formats. Available in our 3.3/4.3 release
Statistics Collection. Collects the statistics for a table to improve query parallelization. Available in our 3.2/4.2 release
Join Improvements. Improve existing hash join implementation.
- Many-to-many joins . Support joins where both sides are too large to fit into memory. Available in our 3.3/4.3 release
- Optimize foreign key joins . Optimize foreign key joins by leveraging our skip scan filter. Available in our 3.2/4.2 release
- Semi/anti joins . Support semi/anti subqueries through the standard [NOT] IN and [NOT] EXISTS keywords. Available in our 3.2/4.2 release
Subqueries Support independent subqueries and correlated subqueries in the WHERE clause as well as subqueries in the FROM clause. Available in our 3.2/4.2 release
Tracing. Allows visibility into the various steps of an UPSERT or SELECT statement along with how long each step took across all the machines in your cluster. Available in our 4.1 release
Local Indexing. A new, complementary indexing stragegry for write heavy, space constrained use cases. With local indexes, index and table data co-reside on same server so no network overhead occurs during writes. Local indexes can be used even when the query isn’t fully covered (i.e. Phoenix automatically retrieve the columns not in the index through point gets against the data table). Available in our 4.1 release
Derived Tables . Allows a SELECT clause to be used in the FROM clause to define a derived table (including join queries). Available in our 3.1/4.1 release
Apache Pig Loader . Support for a Pig loader to leverage the performance of Phoenix when processing data through Pig. Available in our 3.1/4.1 release
Views. Allows the creation of multiple tables using the same physical HBase table. Available in our 3.0/4.0 release
Multi-tenancy. Allows independent views to be created by different tenants on a per-connection basis that all share the same physical HBase table. Available in our 3.0/4.0 release
Sequences. Support for CREATE/DROP SEQUENCE, NEXT VALUE FOR, and CURRENT VALUE FOR has been implemented. Available in our 3.0/4.0 release
ARRAY Type. Support for the standard JDBC ARRAY type. Available in our 3.0/4.0 release
Secondary Indexes. Allows users to create indexes over mutable or immutable data.
Paged Queries. Paged queries through row value constructors, a standard SQL construct to efficiently locate the row at or after a composite key value. Enables a query-more capability to efficiently step through your data and optimizes IN list of composite key values to be point gets.
CSV Bulk Loader. Bulk load CSV files into HBase either through map-reduce or a client-side script.
Aggregation Enhancements. COUNT DISTINCT, PERCENTILE, and STDDEV are now supported.
Type Additions. The FLOAT, DOUBLE, TINYINT, and SMALLINT are now supported.
IN/OR/LIKE Optimizations. When an IN (or the equivalent OR) and a LIKE appears in a query using the leading row key columns, compile it into a skip scanning filter to more efficiently retrieve the query results.
Support ASC/DESC declaration of primary key columns. Allow a primary key column to be declared as ascending (the default) or descending such that the row key order can match the desired sort order (thus preventing an extra sort).
Salting Row Key. To prevent hot spotting on writes, the row key may be "salted" by inserting a leading byte into the row key which is a mod over N buckets of the hash of the entire row key. This ensures even distribution of writes when the row key is a monotonically increasing value (often a timestamp representing the current time).
TopN Queries. Support a query that returns the top N rows, through support for ORDER BY when used in conjunction with TopN.
Dynamic Columns. For some use cases, it's difficult to model a schema up front. You may have columns that you'd like to specify only at query time. This is possible in HBase, in that every row (and column family) contains a map of values with keys that can be specified at run time. So, we'd like to support that.
Apache Bigtop Inclusion. See BIGTOP-993 for more information.

Getting Started

From download to production in a few simple steps.

1. Download

Grab the latest stable release and verify checksums.

2. Read the Guide

Walk through cluster setup, schema design, and operations.

3. Connect a Client

Configure the JDBC client classpath and connection URL.