New Features | Apache Phoenix

This page hasn't been updated recently, and may be missing relevant information for current releases

As items are implemented from our road map, they are moved here to track the progress we've made:

Table Sampling. Support the TABLESAMPLE clause by implementing a filter that uses the guideposts established by stats gathering to only return a percentage of the rows. Available in our 4.12 release
Reduce on disk storage. Reduce on disk storage to improve performance by a) packing all values into a single cell per column family and b) provide an indirection between the column name and the column qualifier. Available in our 4.10 release
Atomic update. Atomic update is now possible in the UPSERT VALUES statement in support of counters and other use cases. Available in our 4.9 release
DEFAULT declaration. When defining a column it is now possible to provide a DEFAULT declaration for the initial value. Available in our 4.9 release
Namespace Mapping. Maps Phoenix schema to HBase namespace to improve isolation between different schemas. Available in our 4.8 release
Hive Integration. Enables Hive to be used with Phoenix in support of joining huge tables to other huge tables. Available in our 4.8 release
Local Index Improvements. Reworked local index implementation to guarantee colocation of table and index data and use supported HBase APIs for better maintainability. Available in our 4.8 release
DISTINCT Query Optimization. Push seek logic to server for SELECT DISTINCT and COUNT DISTINCT queries over the leading parts of the primary key leading to dramically better performance. Available in our 4.8 release
Transaction Support. Supports transactions by integrating with Tephra. Available in our 4.7 release
Time series Optimization. Optimizes queries against time series data as explained in more detail here. Available in our 4.6 release
Asynchronous Index Population. Enables an index to be created asynchronously using a map reduce job. Available in our 4.5 release
User Defined Functions. Allows users to create and deploy their own custom or domain-specific user-defined functions to the cluster. Available in our 4.4 release
Functional Indexes. Enables an index to be defined as expressions as opposed to just column names and have the index be used when a query contains this expression. Available in our 4.3 release
Map-reduce Integration. Support general map-reduce integration to Phoenix by implementing custom input and output formats. Available in our 3.3/4.3 release
Statistics Collection. Collects the statistics for a table to improve query parallelization. Available in our 3.2/4.2 release
Join Improvements. Improve existing hash join implementation.
- Many-to-many joins. Support joins where both sides are too large to fit into memory. Available in our 3.3/4.3 release
- Optimize foreign key joins. Optimize foreign key joins by leveraging our skip scan filter. Available in our 3.2/4.2 release
- Semi/anti joins. Support semi/anti subqueries through the standard [NOT] IN and [NOT] EXISTS keywords. Available in our 3.2/4.2 release
Subqueries Support independent subqueries and correlated subqueries in the WHERE clause as well as subqueries in the FROM clause. Available in our 3.2/4.2 release
Tracing. Allows visibility into the various steps of an UPSERT or SELECT statement along with how long each step took across all the machines in your cluster. Available in our 4.1 release
Local Indexing. A new, complementary indexing stragegry for write heavy, space constrained use cases. With local indexes, index and table data co-reside on same server so no network overhead occurs during writes. Local indexes can be used even when the query isn’t fully covered (i.e. Phoenix automatically retrieve the columns not in the index through point gets against the data table). Available in our 4.1 release
Derived Tables. Allows a SELECT clause to be used in the FROM clause to define a derived table (including join queries). Available in our 3.1/4.1 release
Apache Pig Loader . Support for a Pig loader to leverage the performance of Phoenix when processing data through Pig. Available in our 3.1/4.1 release
Views. Allows the creation of multiple tables using the same physical HBase table. Available in our 3.0/4.0 release
Multi-tenancy. Allows independent views to be created by different tenants on a per-connection basis that all share the same physical HBase table. Available in our 3.0/4.0 release
Sequences. Support for CREATE/DROP SEQUENCE, NEXT VALUE FOR, and CURRENT VALUE FOR has been implemented. Available in our 3.0/4.0 release
ARRAY Type. Support for the standard JDBC ARRAY type. Available in our 3.0/4.0 release
Secondary Indexes. Allows users to create indexes over mutable or immutable data.
Paged Queries. Paged queries through row value constructors, a standard SQL construct to efficiently locate the row at or after a composite key value. Enables a query-more capability to efficiently step through your data and optimizes IN list of composite key values to be point gets.
CSV Bulk Loader. Bulk load CSV files into HBase either through map-reduce or a client-side script.
Aggregation Enhancements. COUNT DISTINCT, PERCENTILE, and STDDEV are now supported.
Type Additions. The FLOAT, DOUBLE, TINYINT, and SMALLINT are now supported.
IN/OR/LIKE Optimizations. When an IN (or the equivalent OR) and a LIKE appears in a query using the leading row key columns, compile it into a skip scanning filter to more efficiently retrieve the query results.
Support ASC/DESC declaration of primary key columns. Allow a primary key column to be declared as ascending (the default) or descending such that the row key order can match the desired sort order (thus preventing an extra sort).
Salting Row Key. To prevent hot spotting on writes, the row key may be “salted” by inserting a leading byte into the row key which is a mod over N buckets of the hash of the entire row key. This ensures even distribution of writes when the row key is a monotonically increasing value (often a timestamp representing the current time).
TopN Queries. Support a query that returns the top N rows, through support for ORDER BY when used in conjunction with TopN.
Dynamic Columns. For some use cases, it's difficult to model a schema up front. You may have columns that you'd like to specify only at query time. This is possible in HBase, in that every row (and column family) contains a map of values with keys that can be specified at run time. So, we'd like to support that.
Apache Bigtop Inclusion. See BIGTOP-993 for more information.