Load Data from Amazon S3 Using the s3 Protocol
The s3 protocol is used in a URL that specifies the location of an Amazon S3 bucket and a prefix to use for reading or writing files in the bucket.
Amazon Simple Storage Service (Amazon S3) provides secure, durable, highly-scalable object storage. For information about Amazon S3, see Amazon S3.
You can define read-only external tables that use existing data files in the S3 bucket for table data, or writable external tables that store the data from INSERT operations to files in the S3 bucket. Apache Cloudberry uses the S3 URL and prefix specified in the protocol URL either to select one or more files for a read-only table, or to define the location and filename format to use when uploading S3 files for INSERT operations to writable tables.
The s3 protocol also supports Dell Elastic Cloud Storage (ECS), an Amazon S3 compatible service.
The pxf protocol can access data in S3 and other object store systems such as Azure, Google Cloud Storage, and Minio. The pxf protocol can also access data in external Hadoop systems (HDFS, Hive, HBase), and SQL databases. See pxf:// protocol.
Configure the s3 protocol
You must configure the s3 protocol before you can use it. Perform these steps in each database in which you want to use the protocol:
-
Create the read and write functions for the
s3protocol library:CREATE OR REPLACE FUNCTION write_to_s3() RETURNS integer AS
'$libdir/gps3ext.so', 's3_export' LANGUAGE C STABLE;CREATE OR REPLACE FUNCTION read_from_s3() RETURNS integer AS
'$libdir/gps3ext.so', 's3_import' LANGUAGE C STABLE; -
Declare the
s3protocol and specify the read and write functions you created in the previous step:To allow only Apache Cloudberry superusers to use the protocol, create it as follows:
CREATE PROTOCOL s3 (writefunc = write_to_s3, readfunc = read_from_s3);If you want to permit non-superusers to use the
s3protocol, create it as aTRUSTEDprotocol andGRANTaccess to those users. For example:CREATE TRUSTED PROTOCOL s3 (writefunc = write_to_s3, readfunc = read_from_s3);
GRANT ALL ON PROTOCOL s3 TO user1, user2;noteThe protocol name
s3must be the same as the protocol of the URL specified for the external table that you create to access an S3 resource.The corresponding function is called by every Apache Cloudberry segment instance.