Query Performance in Apache Cloudberry
Apache Cloudberry improves query performance through dynamic partition elimination and adaptive memory allocation. These mechanisms help reduce the amount of data scanned, speed up query execution, and enhance overall concurrency.
Apache Cloudberry uses the GPORCA optimizer by default, which extends the native Postgres planner with more advanced optimization capabilities.
Dynamic partition eliminationโ
Apache Cloudberry supports dynamic partition elimination (DPE), a feature that prunes partitions at query execution time based on runtime values. This reduces the data scanned and improves query efficiency.
DPE is supported for the following join types:
Hash Inner Join
Hash Left Join
Hash Right Join
(since v2.0.0)
DPE is enabled when the following conditions are met:
-
The partitioned table is on the outer side of the join.
-
The join condition is an equality predicate on the partition key.
-
Statistics are collected on the partitioned tables. For example:
ANALYZE <root partition>;
The gp_dynamic_partition_pruning
parameter controls whether DPE is enabled. It is ON
by default but only applies to the Postgres optimizer. You can verify if DPE is in effect by checking the EXPLAIN
plan for the presence of a Partition Selector
node.
Memory optimizationsโ
Apache Cloudberry dynamically allocates memory based on the characteristics of query operators and proactively releases or reallocates memory during different query phases. This leads to more efficient memory usage and faster query execution.
Apache Cloudberry uses GPORCA by default. GPORCA extends the planning and optimization capabilities of the Postgres optimizer.
๐๏ธ Query Processing Overview
This topic provides an overview of how Apache Cloudberry processes queries. Understanding this process can be useful when writing and tuning queries.
๐๏ธ Analyze Query Performance
Examine the query plans of poorly performing queries to identify possible performance tuning opportunities.
๐๏ธ GPORCA Query Optimizer
6 items
๐๏ธ Create Unique Index on AO Table
You can create a unique index on an Append-Optimized (AO) or Append-Optimized Column Store (AOCS) table in Apache Cloudberry. With a unique index, Apache Cloudberry checks the unique constraint when data is inserted into the AO table to ensure the uniqueness of the data. At the same time, the database optimizes specific queries with the optimizer to improve the query performance. However, this also brings some overhead for maintaining a unique index, especially when inserting data.
๐๏ธ Use Auto Materialized Views for Query Optimization
Apache Cloudberry supports automatically using materialized views to process some or all queries (called AQUMV) during the query planning phase. This feature is suitable for queries on large tables and can greatly reduce query processing time. AQUMV uses incremental materialized views (IMVs) because IMVs usually keep the latest data when related tables have write operations.
๐๏ธ Incremental Materialized View
This document introduces the usage scenarios of the incremental materialized view in Apache Cloudberry, how to use it, the restrictions, and the things to note.
๐๏ธ Create AO/AOCO Tables and Refresh Materialized Views in Parallel
Apache Cloudberry supports creating append-optimized (AO) tables and append-optimized column-oriented (AOCO) tables in parallel by using the CREATE TABLE AS statement, and supports refreshing materialized views in parallel based on the AO or AOCO tables. Parallel processing accelerates table creation and materialized view refresh.
๐๏ธ Parallel Queries Execution
This document introduces the usage scenarios, usage examples, restrictions, and common issues of parallel query execution in Apache Cloudberry. When Apache Cloudberry executes a query, multiple CPU cores are used to process a single query, thereby improving query performance. The database dynamically adjusts the number of computing nodes (including the SeqScan operator) according to the data volume change.
๐๏ธ Use Aggregation Pushdown to Speed Up Query Execution
Aggregation pushdown is an optimization technique that moves the aggregation operation closer to the data source. Apache Cloudberry supports pushing down aggregation operations, which means that the aggregation operator is processed before the join operator.
๐๏ธ Use IndexScan on AO Tables
Apache Cloudberry supports IndexScan when querying on App-Optimized (AO) tables to improve query efficiency in certain scenarios, such as the following query:
๐๏ธ Use RuntimeFilter to Optimize Join Queries
When performing join queries on large tables, the SQL optimizer of Apache Cloudberry often uses the HashJoin operator. HashJoin builds a hash table based on the join key to match the join key, which might cause a memory access bottleneck and disk bottleneck. RuntimeFilter is an optimization technique that generates filters in real-time during HashJoin operations, which can pre-filter data before executing HashJoin to speed up HashJoin operations. In some scenarios, the RuntimeFilter optimization can double execution efficiency.
๐๏ธ Query Plan Hints
Apache Cloudberry uses two types of query optimizers