Join Queries
In Apache Cloudberry, the JOIN clause combines rows from two or more tables based on related column values. The JOIN clause is part of the FROM clause in a SELECT statement.
The syntax for the JOIN clause is as follows:
table1_name join_type table2_name [join_condition]
Where:
-
table1_name,table2_name: Names of the tables to be joined. -
join_type: The type of join, which can be one of the following:[INNER] JOINLEFT [OUTER] JOINRIGHT [OUTER] JOINFULL [OUTER] JOINNATURAL JOIN
-
join_condition: An optional condition that specifies how to match rows from the two tables. It can take one of the following forms:ON <join_condition>USING ( <join_column> [, ...] )LATERAL
For FULL JOIN queries, the ORCA optimizer automatically chooses between Merge Join and Hash Join based on cost estimates, so users do not need to manually specify the join method.
Join types
Apache Cloudberry supports the following types of joins:
INNER JOIN returns the intersection of rows from both tables that satisfy the join condition. In other words, it returns only the rows with matching values in both tables. If INNER is omitted before JOIN, it defaults to INNER JOIN.
SELECT *
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;
LEFT OUTER JOIN
LEFT OUTER JOIN (or simply LEFT JOIN) returns all rows from the left table and the matching rows from the right table. If there is no match in the right table, the result includes NULL values for columns from the right table.
SELECT *
FROM table1
LEFT OUTER JOIN table2
ON table1.column_name = table2.column_name;
RIGHT OUTER JOIN
RIGHT OUTER JOIN (or simply RIGHT JOIN) is the opposite of LEFT OUTER JOIN. It returns all rows from the right table and the matching rows from the left table. If there is no match in the left table, the result includes NULL values for columns from the left table.
SELECT *
FROM table1
RIGHT OUTER JOIN table2
ON table1.column_name = table2.column_name;
Starting from v2.0.0, Hash Right Join queries can also trigger dynamic partition elimination (DPE) when partition pruning conditions are met, reducing partition scans and improving performance.
FULL OUTER JOIN
FULL OUTER JOIN (or simply FULL JOIN) returns all rows from both the left and right tables. For unmatched rows in the left table, the right table columns are filled with NULL. For unmatched rows in the right table, the left table columns are filled with NULL.
SELECT *
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name;
CROSS JOIN
CROSS JOIN returns the Cartesian product of two tables. It combines every row from the left table with every row from the right table. If there is no WHERE clause to filter the results, the output will contain the number of rows in the left table multiplied by the number of rows in the right table.
SELECT *
FROM table1
CROSS JOIN table2;
NATURAL JOIN
The NATURAL clause is a further shorthand for the USING clause. It is used when all columns with the same names in both tables should be used for the join. If there are no columns with the same names, NATURAL JOIN behaves like a CROSS JOIN. Use NATURAL JOIN with caution, as it relies on column names and may produce unexpected results.
SELECT *
FROM table1
NATURAL JOIN table2;
Join conditions
Join conditions define how to match rows between two tables. There are three types of join conditions: