Metrics

Phoenix surfaces various metrics that provide an insight into what is going on within the Phoenix client as it is executing various SQL statements. These metrics are collected within the client JVM in two ways:

Request level metrics - collected at an individual SQL statement level
Global metrics - collected at the client JVM level

Request level metrics are helpful for figuring out at a more granular level about the amount of work done by every SQL statement executed by Phoenix. These metrics can be classified into three categories:

Request-level metrics

Mutation metrics

MUTATION_BATCH_SIZE - Batch sizes of mutations
MUTATION_BYTES - Size of mutations in bytes
MUTATION_COMMIT_TIME - Time it took to commit mutations

Scan task metrics

NUM_PARALLEL_SCANS - Number of scans executed in parallel
SCAN_BYTES - Number of bytes read by scans
MEMORY_CHUNK_BYTES - Number of bytes allocated by the memory manager
MEMORY_WAIT_TIME - Time in milliseconds threads needed to wait for memory to be allocated through memory manager
SPOOL_FILE_SIZE - Size of spool files created in bytes
SPOOL_FILE_COUNTER - Number of spool files created
CACHE_REFRESH_SPLITS_COUNTER - Number of times Phoenix's metadata cache was refreshed because of splits
TASK_QUEUE_WAIT_TIME - Time in milliseconds tasks had to wait in the queue of the thread pool executor
TASK_END_TO_END_TIME - Time in milliseconds spent by tasks from creation to completion
TASK_EXECUTION_TIME - Time in milliseconds tasks took to execute
TASK_EXECUTED_COUNTER - Counter for number of tasks submitted to the thread pool executor
TASK_REJECTED_COUNTER - Counter for number of tasks that were rejected by the thread pool executor

Overall query metrics

QUERY_TIMEOUT_COUNTER - Number of times query timed out
QUERY_FAILED_COUNTER - Number of times query failed
WALL_CLOCK_TIME_MS - Wall clock time elapsed for the overall query execution
RESULT_SET_TIME_MS - Wall clock time elapsed for reading all records using resultSet.next()

How to use SQL statement-level metrics

Log and report query execution details which could be later used for analysis.
Report top SQL queries by duration. Metric to use: WALL_CLOCK_TIME_MS.
Check if the query is failing because it is timing out. Metric to use: QUERY_TIMEOUT_COUNTER > 0.
Monitor the amount of bytes being written to or read from HBase for a SQL statement. Metrics to use: MUTATION_BYTES and SCAN_BYTES.
Check if the query is doing too much work or needs tuning. Possible metrics to use: TASK_EXECUTED_COUNTER, TASK_QUEUE_WAIT_TIME, WALL_CLOCK_TIME_MS.
Check if a successful query is facing thread starvation, i.e., number of threads in the thread pool likely needs to be increased. This is characterized by a relatively large difference between TASK_EXECUTION_TIME and TASK_END_TO_END_TIME.

Request level metrics can be turned on/off for every Phoenix JDBC connection. Below is an example of how you can do that:

Properties props = new Properties();
props.setProperty(QueryServices.COLLECT_REQUEST_LEVEL_METRICS, "true");
try (Connection conn = DriverManager.getConnection(getUrl(), props)) {
    // ...
}

A typical pattern for how one could get hold of read metrics for queries:

Map<String, Map<String, Long>> overAllQueryMetrics = null;
Map<String, Map<String, Long>> requestReadMetrics = null;
try (ResultSet rs = stmt.executeQuery()) {
    while (rs.next()) {
        // ...
    }
    overAllQueryMetrics = PhoenixRuntime.getOverAllReadRequestMetrics(rs);
    requestReadMetrics = PhoenixRuntime.getRequestReadMetrics(rs);
    // log or report metrics as needed
    PhoenixRuntime.resetMetrics(rs);
}

One could also get hold of write related metrics (collected per table) for DML statements by doing something like this:

Map<String, Map<String, Long>> mutationWriteMetrics = null;
Map<String, Map<String, Long>> mutationReadMetrics = null;
try (Connection conn = DriverManager.getConnection(url)) {
    conn.createStatement().executeUpdate(dml1);
    // ...
    conn.createStatement().executeUpdate(dml2);
    // ...
    conn.createStatement().executeUpdate(dml3);
    // ...
    conn.commit();
    mutationWriteMetrics = PhoenixRuntime.getWriteMetricsForMutationsSinceLastReset(conn);
    mutationReadMetrics = PhoenixRuntime.getReadMetricsForMutationsSinceLastReset(conn);
    PhoenixRuntime.resetMetrics(conn);
}

Global metrics on the other hand are collected at the Phoenix client’s JVM level. These metrics could be used for building out a trend and seeing what is going on within Phoenix from client’s perspective over time. Other than the metrics reported above for request level metrics, the global metrics also includes the following counters:

MUTATION_SQL_COUNTER - Counter for number of mutation SQL statements
SELECT_SQL_COUNTER - Counter for number of SQL queries
OPEN_PHOENIX_CONNECTIONS_COUNTER - Number of open Phoenix connections

Global metrics could be helpful in monitoring and tuning various aspects of the execution environment. For example: an increase in TASK_REJECTED_COUNTER is probably a symptom of too much work being submitted, or that the Phoenix thread pool queue depth or number of threads (or both) needs to be increased. Similarly, a spike in TASK_EXECUTION_TIME for a time frame could be symptomatic of several things including overloaded region servers, a network glitch, or client/region servers undergoing garbage collection.

Collection of global client metrics can be turned on/off (on by default) by setting the attribute phoenix.query.global.metrics.enabled to true/false in the client side hbase-site.xml. Below is a code snippet showing how to log/report global metrics by using a scheduled job that runs periodically:

ScheduledExecutorService service = Executors.newScheduledThreadPool(1);
service.submit(new Runnable() {
    @Override
    public void run() {
        Collection<GlobalMetric> metrics = PhoenixRuntime.getGlobalPhoenixClientMetrics();
        for (GlobalMetric m : metrics) {
            // log or report for trending purposes
        }
    }
});

Scan latency metrics

Phoenix 5.3.1 surfaces HBase's per-scan latency and IO metrics through the request-level metrics API, giving per-query visibility into where a scan actually spent its time (PHOENIX-7704).

Metric	Description
`FS_READ_TIME`	Time (ms) spent in the underlying filesystem (HDFS/S3) for HFile reads.
`BYTES_READ_FROM_FS`	Bytes read from the filesystem — cold reads that missed HBase block cache.
`BYTES_READ_FROM_MEMSTORE`	Bytes read from memstore — un-flushed cells.
`BYTES_READ_FROM_BLOCKCACHE`	Bytes served from the HBase block cache — warm reads.
`BLOCK_READ_OPS_COUNT`	Number of HFile blocks read from filesystem (HDFS/S3).
`RPC_SCAN_PROCESSING_TIME`	Time (ms) the RegionServer spent processing the scan RPC in handler thread.
`RPC_SCAN_QUEUE_WAIT_TIME`	Time (ms) the scan RPC spent in the RegionServer's call queue.

These are request-level metrics — they appear in the per-table map returned by PhoenixRuntime.getRequestReadMetricInfo(rs) alongside the existing entries:

try (ResultSet rs = stmt.executeQuery(sql)) {
    while (rs.next()) {
        // ...
    }
    Map<String, Map<MetricType, Long>> readMetrics =
            PhoenixRuntime.getRequestReadMetricInfo(rs);
    long fsReadMs = readMetrics.get(tableName).get(MetricType.FS_READ_TIME);
    long blocksRead = readMetrics.get(tableName).get(MetricType.BLOCK_READ_OPS_COUNT);
    // ...
}

They piggyback on phoenix.trace.read.metrics.enabled (the existing request-level metrics switch, on by default) and require no additional configuration. Values originate from HBase 2.6.4+; on older HBase versions they are silently zero.

Top-N slowest parallel scans

For diagnosing tail-latency on a single statement issuing multiple HBase scans, Phoenix can retain the top-N slowest scan paths it issued while executing a query — useful when a query fans out across many regions and you want to find the worst few (PHOENIX-7729).

Enabling

Off by default. Set the count to a positive number on the connection (Properties) or globally in hbase-site.xml:

Property	Default	Description
`phoenix.slowest.scan.metrics.count`	`0`	N — number of slowest scans to retain. `0` disables the feature.
`phoenix.scan.metrics.by.region.enabled`	`false`	When `true` (and `count > 0`), the per-scan record also carries the region name and serving RegionServer. Requires HBase 2.6.3+.

Request-level metrics (phoenix.trace.read.metrics.enabled) must also be on, which is the default.

Retrieving the result

PhoenixRuntime.getTopNSlowestScanMetrics(rs) returns the slowest scan paths seen while executing the query, as a list of JSON-object lists — one outer entry per slow scan path, one inner entry per HBase table touched on that path. Forward to your telemetry sink as needed.

try (ResultSet rs = stmt.executeQuery(sql)) {
    while (rs.next()) {
        // ...
    }
    List<List<JsonObject>> slowest = PhoenixRuntime.getTopNSlowestScanMetrics(rs);
    for (List<JsonObject> path : slowest) {
        for (JsonObject scan : path) {
            log.info("slow scan on {}: {}", scan.get("table"), scan);
        }
    }
}

Metrics

On this page