Phoenix favicon

Apache Phoenix

Features

ROW_SIZE() and RAW_ROW_SIZE()

Built-in SQL functions that return the on-the-wire HBase byte footprint of a row — useful for hot-row diagnosis, capacity planning, and finding outliers directly from SQL.

ROW_SIZE() and RAW_ROW_SIZE() return the HBase byte footprint of the row currently being scanned. Both are zero-argument scalar functions that return UNSIGNED_LONG. Available in Phoenix 5.3.1 (PHOENIX-7705).

ROW_SIZE()RAW_ROW_SIZE()
Argumentsnonenone
Return typeUNSIGNED_LONGUNSIGNED_LONG
Countslatest visible version of each cellall retained cell versions and delete markers

Each cell's contribution is its full HBase footprint — row key, column family, qualifier, timestamp, type byte, tags, and value — so the result is not equal to the sum of user-visible column value lengths.

Usage

ROW_SIZE() is only valid as an argument to an aggregate in the SELECT list, or inside a WHERE clause. A bare projection (SELECT ROW_SIZE() FROM t) is rejected at compile time.

Total table footprint

SELECT SUM(ROW_SIZE()) FROM my_table;

Per-row size

Group by the primary key so each group is a single row:

SELECT SUM(ROW_SIZE()) FROM my_table GROUP BY id;

Distribution

SELECT AVG(ROW_SIZE()), MIN(ROW_SIZE()), MAX(ROW_SIZE()) FROM my_table;

Find rows whose footprint exceeds a threshold

ROW_SIZE() is also valid in WHERE:

SELECT COUNT(1)
FROM my_table
WHERE ROW_SIZE() > 1024 AND status = 'ACTIVE';

Including delete markers and old versions

SELECT organization_id, SUM(RAW_ROW_SIZE())
FROM my_table
GROUP BY organization_id;

RAW_ROW_SIZE() counts every retained cell version and the bytes of any delete-family or delete-column tombstones — useful for measuring the post-compaction-debt footprint of a row.

Limitations and caveats

  • Wrap in an aggregate. SELECT ROW_SIZE() FROM t is rejected. Use SUM(ROW_SIZE()) (with GROUP BY <pk> for per-row values).
  • Forces a full row read. A query using either function reads more bytes per row than the same query without it, because the scan can no longer use empty-column / key-only / encoded-qualifier optimizations. Reach for these functions for diagnostics, not hot-path scans.
  • RAW_ROW_SIZE() reads all versions and tombstones. Both the byte count and the row count it produces will exceed ROW_SIZE() on the same data.
  • Cell footprint, not user-data size. Adding a column generally increases ROW_SIZE() by more than that column's value byte length.
  • Measures whatever physical row the planner scans. If the optimizer chooses a secondary index, ROW_SIZE() will measure the index row, not the data row. Pin the plan with a hint (e.g. /*+ NO_INDEX */) if you need a specific physical answer.

See also

  • Metrics — runtime measurements on the client side (scan bytes, mutation bytes, scan latency, etc.).
  • Statistics Collection — aggregate estimates without scanning every row.
Edit on GitHub

On this page