Skip to main content
Version: Next

Query Performance in Apache Cloudberry

Apache Cloudberry improves query performance through dynamic partition elimination and adaptive memory allocation. These mechanisms help reduce the amount of data scanned, speed up query execution, and enhance overall concurrency.

tip

Apache Cloudberry uses the GPORCA optimizer by default, which extends the native Postgres planner with more advanced optimization capabilities.

Dynamic partition eliminationโ€‹

Apache Cloudberry supports dynamic partition elimination (DPE), a feature that prunes partitions at query execution time based on runtime values. This reduces the data scanned and improves query efficiency.

DPE is supported for the following join types:

  • Hash Inner Join
  • Hash Left Join
  • Hash Right Join (since v2.0.0)

DPE is enabled when the following conditions are met:

  • The partitioned table is on the outer side of the join.

  • The join condition is an equality predicate on the partition key.

  • Statistics are collected on the partitioned tables. For example:

    ANALYZE <root partition>;

The gp_dynamic_partition_pruning parameter controls whether DPE is enabled. It is ON by default but only applies to the Postgres optimizer. You can verify if DPE is in effect by checking the EXPLAIN plan for the presence of a Partition Selector node.

Memory optimizationsโ€‹

Apache Cloudberry dynamically allocates memory based on the characteristics of query operators and proactively releases or reallocates memory during different query phases. This leads to more efficient memory usage and faster query execution.

info

Apache Cloudberry uses GPORCA by default. GPORCA extends the planning and optimization capabilities of the Postgres optimizer.

๐Ÿ“„๏ธ Create Unique Index on AO Table

You can create a unique index on an Append-Optimized (AO) or Append-Optimized Column Store (AOCS) table in Apache Cloudberry. With a unique index, Apache Cloudberry checks the unique constraint when data is inserted into the AO table to ensure the uniqueness of the data. At the same time, the database optimizes specific queries with the optimizer to improve the query performance. However, this also brings some overhead for maintaining a unique index, especially when inserting data.

๐Ÿ“„๏ธ Use RuntimeFilter to Optimize Join Queries

When performing join queries on large tables, the SQL optimizer of Apache Cloudberry often uses the HashJoin operator. HashJoin builds a hash table based on the join key to match the join key, which might cause a memory access bottleneck and disk bottleneck. RuntimeFilter is an optimization technique that generates filters in real-time during HashJoin operations, which can pre-filter data before executing HashJoin to speed up HashJoin operations. In some scenarios, the RuntimeFilter optimization can double execution efficiency.