Develop Database Extensions Using PGRX
This document explains how to develop database extensions using Rust and the PGRX framework. PGRX is a Rust framework for developing extensions for Apache Cloudberry, offering a safe and efficient development experience.
For the core features of PGRX, see PGRX Core Features. For notes of PGRX, see Considerations and Best Practices for PGRX.
Requirements for Development Environment
- Make sure that your OS is one of Debian/Ubuntu and RHEL/CentOS.
- Make sure that your Apache Cloudberry cluster is compiled from source code, not installed using RPM package.
Basic Software Environment
Required software:
- Rust toolchain (
rustc
,cargo
, andrustfmt
) - install via https://rustup.rs - Git
- libclang 11 or higher (for bindgen)
- GCC 7 or higher
PostgreSQL Dependencies
Install required PostgreSQL dependencies for your OS:
For Debian/Ubuntu:
sudo apt-get install build-essential libreadline-dev zlib1g-dev flex bison \
libxml2-dev libxslt-dev libssl-dev libxml2-utils xsltproc ccache pkg-config
For RHEL/CentOS:
sudo yum install -y bison-devel readline-devel zlib-devel openssl-devel wget ccache
sudo yum groupinstall -y 'Development Tools'
After installing the dependencies, you can start developing extensions.
Quick Start for PGRX
This section introduces the process of quickly developing extensions using PGRX, including:
- Setting up and installing PGRX
- Creating extension
- Installing and using extension
Set up and Install PGRX
PGRX is maintained by PgCentral Foundation, Inc., while we used here is one forked PGRX version with better compatibility within Cloudberry. It's contributed by the community members and customized for Cloudberry, but please note that it is not maintained as one official Cloudberry project.
-
Set the environment variable for Apache Cloudberry's
pg_config
path, where<pg_config_path>
is the path in your Apache Cloudberry cluster (for example,/usr/local/cloudberry-db/bin/pg_config
):export PGRX_PG_CONFIG_PATH=<pg_config_path>
-
Build the PGRX framework:
-
Clone the Apache Cloudberry-compatible
pgrx
repository:git clone https://github.com/cloudberry-contrib/pgrx
cd pgrx -
Build the code with
pg14
andcbdb
features enabled:cargo build --features "pg14, cbdb"
-
-
Install the Apache Cloudberry-compatible
cargo-pgrx
tool:cargo install --path cargo-pgrx/
-
Initialize the environment with your database kernel version:
cargo pgrx init --pg14=`which pg_config`
Create an Extension
-
Generate an extension template. This example creates an extension named
my_extension
:cargo pgrx new my_extension
cd my_extensionThe created directory structure is as follows:
.
├── Cargo.toml
├── my_extension.control
├── sql
└── src
├── bin
│ └── pgrx_embed.rs
└── lib.rs -
Modify dependencies in
Cargo.toml
to use local PGRX:-
Change
pgrx = "0.12.7"
under[dependencies]
to point to thepgrx
directory in your local PGRX repository. For example:[dependencies]
pgrx = { path = "/home/gpadmin/pgrx/pgrx/", features = ["pg14", "cbdb"] } -
Add
pgrx-pg-sys
under[dependencies]
to point to thepgrx-pg-sys
directory in your local PGRX repository. For example:[dependencies]
pgrx-pg-sys = { path = "/home/gpadmin/pgrx/pgrx-pg-sys/", features = ["pg14", "cbdb"] } -
Change
pgrx-tests = "0.12.7"
under[dev-dependencies]
to point to thepgrx-tests
directory in your local PGRX repository:[dev-dependencies]
pgrx-tests = { path = "/home/gpadmin/pgrx/pgrx-tests/" }
-
-
Append the extension name
my_extension
to theworkspace.members
array of theCargo.toml
file in the root directory of your local PGRX repository. For example:vi /home/gpadmin/pgrx/Cargo.toml
[workspace]
resolver = "2"
members = [
"cargo-pgrx",
"pgrx",
"pgrx-macros",
"pgrx-pg-config",
"pgrx-pg-sys",
"pgrx-sql-entity-graph",
"pgrx-tests",
"pgrx-bindgen",
"my_extension"
] -
Grant the current system user the permissions to the Apache Cloudberry directory. For example, if the current user is
gpadmin
and Apache Cloudberry directory is/usr/local/cloudberrydb
:sudo chown -R gpadmin:gpadmin /usr/local/cloudberrydb
Install and Use the Extension
-
Install the extension:
cargo pgrx install
-
To use the extension in the database, connect to the database and execute the following statements:
CREATE EXTENSION my_extension;
-- Tests example function
SELECT hello_my_extension();
PGRX Type Mapping
The table below lists the complete mapping of Apache Cloudberry (PostgreSQL) data types to Rust types:
Database data type | Rust type (Option<T> ) |
---|---|
bytea | Vec<u8> or &[u8] (zero-copy) |
text | String or &str (zero-copy) |
varchar | String or &str (zero-copy) or char |
"char" | i8 |
smallint | i16 |
integer | i32 |
bigint | i64 |
oid | u32 |
real | f32 |
double precision | f64 |
bool | bool |
json | pgrx::Json(serde_json::Value) |
jsonb | pgrx::JsonB(serde_json::Value) |
date | pgrx::Date |
time | pgrx::Time |
timestamp | pgrx::Timestamp |
time with time zone | pgrx::TimeWithTimeZone |
timestamp with time zone | pgrx::TimestampWithTimeZone |
anyarray | pgrx::AnyArray |
anyelement | pgrx::AnyElement |
box | pgrx::pg_sys::BOX |
point | pgrx::pg_sys::Point |
tid | pgrx::pg_sys::ItemPointerData |
cstring | &core::ffi::CStr |
numeric | pgrx::Numeric<P, S> or pgrx::AnyNumeric |
void | () |
ARRAY[]::<type> | Vec<Option<T>> or pgrx::Array<T> (zero-copy) |
int4range | pgrx::Range<i32> |
int8range | pgrx::Range<i64> |
numrange | pgrx::Range<Numeric<P, S>> or pgrx::Range<AnyRange> |
daterange | pgrx::Range<pgrx::Date> |
tsrange | pgrx::Range<pgrx::Timestamp> |
tstzrange | pgrx::Range<pgrx::TimestampWithTimeZone> |
NULL | Option::None |
internal | pgrx::PgBox<T> (where T can be any Rust/Postgres struct) |
uuid | pgrx::Uuid([u8; 16]) |
Custom Type Conversions
You can implement additional type conversions in the following ways:
- Implement
IntoDatum
andFromDatum
traits. - Use
#[derive(PostgresType)]
and#[derive(PostgresEnum)]
for automatic type conversions.
Type Mapping Details
PGRX converts text
and varchar
to &str
or String
, and verifies whether the encoding is UTF-8. If an encoding other than UTF-8 is detected, PGRX triggers a panic to alert the developer. Because UTF-8 validation might affect performance, it is not recommended to rely on UTF-8 validation.
The default encoding for PostgreSQL servers is SQL_ASCII
, which guarantees neither ASCII nor UTF-8 (Apache Cloudberry will accept but ignore non-ASCII bytes). For best results, always use UTF-8 encoding with PGRX and explicitly set the database encoding when creating the database.
PGRX Core Features
Complete Management for Development Environment
cargo-pgrx provides a complete set of command-line tools:
cargo pgrx new
: Quickly creates a new extension.cargo pgrx init
: Installs or registers an Apache Cloudberry (PostgreSQL) instance.cargo pgrx run
: Interactively tests the extension in psql (or pgcli).cargo pgrx test
: Performs unit tests across multiple Apache Cloudberry (PostgreSQL) versions.cargo pgrx package
: Creates an extension installation package.
Automatic Mode Generation
- Fully implements the extension using Rust.
- Automatically maps various Rust types to Apache Cloudberry (PostgreSQL) types.
- Automatically generates SQL schema (can also be manually generated using
cargo pgrx schema
). - Uses
extension_sql!
andextension_sql_file!
to include custom SQL.
Security First
- Converts Rust's
panic!
to Apache Cloudberry/PostgreSQL'sERROR
(abort the transaction, not the process). - Memory management follows Rust's
DROP
semantics, including handlingpanic!
andelog(ERROR)
cases. - Uses
#[pg_guard]
procedural macro to ensure safety. - Apache Cloudberry's
Datum
is represented asOption<T> where T: FromDatum
, with NULL values safely represented asOption::<T>::None
.
UDF Supports
- Uses
#[pg_extern]
annotation to expose functions to Apache Cloudberry. - Returns
pgrx::iter::SetOfIterator<'a, T>
to implementRETURNS SETOF
. - Returns
pgrx::iter::TableIterator<'a, T>
to implementRETURNS TABLE (...)
. - Uses
#[pg_trigger]
to create trigger functions.
Simple Custom Types
- Uses
#[derive(PostgresType)]
to treat Rust structs as Apache Cloudberry types.- By default, CBOR encoding is used for storage, and JSON is used as a human-readable format.
- Supports custom memory/disk/readable formats.
- Uses
#[derive(PostgresEnum)]
to treat Rust enums as Apache Cloudberry (PostgreSQL) enums. - Supports composite types via
pgrx::composite_type!("Sample")
macro.
Server Programming Interface (SPI)
- Secure access to SPI.
- Transparently returns ownership of Datum from SPI context.
Advanced Features
- Securely accesses Apache Cloudberry's memory context system via
pgrx::PgMemoryContexts
. - Supports executor/planner/transaction/subtransaction hooks.
- Securely handles Apache Cloudberry pointers using
pgrx::PgBox<T>
(similar toalloc::boxed::Box<T>
). - Protects Rust functions passed to Apache Cloudberry's
extern "C"
using#[pg_guard]
procedural macro. - Accesses Apache Cloudberry's logging system via the
eprintln!
macro. - Directly (unsafe) accesses Apache Cloudberry internals via the
pgrx::pg_sys
module.
Considerations and Best Practices for PGRX
Thread supports:
- Apache Cloudberry strictly follows a single-threaded model.
- Custom threads cannot call internal database functions.
- The interaction method for asynchronous contexts is still under exploration.
Encoding requirements:
- It is recommended to use UTF-8 encoding.
- The default server encoding is SQL_ASCII.
- It is recommended to explicitly set the encoding when creating the database.
Debugging and Development Tips
- Uses
cargo pgrx test
for unit testing. - Uses
#[pg_guard]
to ensure memory safety. - For custom types, uses appropriate serialization methods.
Learning Resources for PGRX
The following resources can help you gain a deeper understanding of PGRX:
- Learn about all available
cargo-pgrx
subcommands and options: cargo-pgrx command details - Learn how to define and use custom data types: custom type examples
- Explore how to implement custom operators: operator functions and operator classes/families
- Learn how to use shared memory: shared memory support
- Browse example code implementations: more example code