Query Performance in Apache Cloudberry
Apache Cloudberry improves query performance through dynamic partition elimination and adaptive memory allocation. These mechanisms help reduce the amount of data scanned, speed up query execution, and enhance overall concurrency.
Apache Cloudberry uses the GPORCA optimizer by default, which extends the native Postgres planner with more advanced optimization capabilities.
Dynamic partition eliminationโ
Apache Cloudberry supports dynamic partition elimination (DPE), a feature that prunes partitions at query execution time based on runtime values. This reduces the data scanned and improves query efficiency.
DPE is supported for the following join types:
Hash Inner Join
Hash Left Join
Hash Right Join
(since v2.0.0)
DPE is enabled when the following conditions are met:
-
The partitioned table is on the outer side of the join.
-
The join condition is an equality predicate on the partition key.
-
Statistics are collected on the partitioned tables. For example:
ANALYZE <root partition>;
The gp_dynamic_partition_pruning
parameter controls whether DPE is enabled. It is ON
by default but only applies to the Postgres optimizer. You can verify if DPE is in effect by checking the EXPLAIN
plan for the presence of a Partition Selector
node.
Memory optimizationsโ
Apache Cloudberry dynamically allocates memory based on the characteristics of query operators and proactively releases or reallocates memory during different query phases. This leads to more efficient memory usage and faster query execution.
Apache Cloudberry uses GPORCA by default. GPORCA extends the planning and optimization capabilities of the Postgres optimizer.
๐๏ธ Optimize Query Performance
10 items
๐๏ธ Update Statistics
The most important prerequisite for good query performance is to begin with accurate statistics for the tables. Updating statistics with the ANALYZE statement enables the query planner to generate optimal query plans. When a table is analyzed, information about the data is stored in the system catalog tables. If the stored information is out of date, the planner can generate inefficient plans.
๐๏ธ Use Column-Level Compression
Apache Cloudberry supports column-level compression, which reduces storage space by compressing specific columns. In some cases, it can also improve query performance, especially when processing large-scale data.
๐๏ธ Resource Groups
You can use resource groups to manage and protect the resource allocation of CPU, memory, concurrent transaction limits, and disk I/O in Apache Cloudberry. Once you define a resource group, you assign the group to one or more Apache Cloudberry roles, or to an external component such as PL/Container, in order to control the resources used by them.
๐๏ธ Use Dynamic Tables
Dynamic tables are database objects similar to materialized views that refresh data automatically and speed up queries. Apache Cloudberry introduces dynamic tables to make query processing faster and data updates automatic.