Migration and Upgrade
This guide provides detailed instructions for two scenarios:
- Migrating from Greenplum Database 6.x/7.x to Apache Cloudberry
- Upgrading from Apache Cloudberry 1.x (non-Apache release before joining the Apache Incubator) to Apache Cloudberry 2.x
Both scenarios utilize the cbcopy
tool, a powerful data migration utility designed for efficient database migration and upgrade operations.
Prerequisites
Before starting the migration or upgrade process, ensure you have:
- Go Programming language version 1.19 or higher installed
- Superuser privileges on both source and target databases
- Network connectivity between source and target database clusters
- Sufficient disk space on both source and target clusters
- The
cbcopy
andcbcopy_helper
binaries installed in the$GPHOME/bin
directory on all nodes of both source and target databases
Installing cbcopy
- Clone the repository:
git clone https://github.com/cloudberry-contrib/cbcopy.git
- Build the binaries:
cd cbcopy
make
- Install the binaries:
make install
Version Compatibility
Cloudberry Upgrade (1.x to 2.x)
- Source: Cloudberry 1.6
- Target: Cloudberry 2.0
Greenplum to Cloudberry Migration
- Source: Greenplum Database 6.x or 7.x
- Target: Cloudberry 1.x or 2.x
Migration Process
The migration process consists of two main phases:
- Metadata migration
- Data migration
We recommend performing these phases separately for better control and reliability:
- First migrate metadata using
--metadata-only
- Then migrate data using
--data-only
Basic Migration Command
cbcopy --source-host=<source_host> \
--source-port=<source_port> \
--source-user=<source_user> \
--dest-host=<dest_host> \
--dest-port=<dest_port> \
--dest-user=<dest_user> \
[additional_options]
Migration Modes
cbcopy supports several migration modes:
-
Full Migration (
--full
)- Migrates all metadata and data from source to target
-
Database Migration (
--dbname
)- Migrates specific database(s)
cbcopy --dbname="database1,database2" [other_options]
-
Schema Migration (
--schema
)- Migrates specific schema(s)
cbcopy --schema="database.schema1,database.schema2" [other_options]
-
Table Migration (
--include-table
)- Migrates specific table(s)
cbcopy --include-table="database.schema.table1,database.schema.table2" [other_options]
Data Loading Options
-
Append Mode (
--append
)- Inserts migrated records into existing tables
- Use when you want to preserve existing data
-
Truncate Mode (
--truncate
)- Clears existing records before inserting migrated data
- Use when you want to replace existing data
cbcopy --truncate [other_options]
Handling Dependencies
Global Objects
For tables depending on global objects (e.g., tablespaces):
- Use
--with-global-metadata
to automatically create these objects - Or manually create them before migration:
CREATE TABLESPACE custom_tablespace LOCATION '/path/to/tablespace';
Role Management
To change table ownership during migration:
- Create target roles in the target database
- Use
--owner-mapping-file
to specify role mappings:source_role1,target_role1
source_role2,target_role2
Tablespace Management
Three options for handling tablespaces:
-
Default Mode
- Objects created in same tablespace names as source
- Use
--with-global-metadata
or manually create tablespaces
-
Single Target Tablespace (
--dest-tablespace
)cbcopy --dest-tablespace=new_space [other_options]
-
Tablespace Mapping (
--tablespace-mapping-file
)source_tablespace1,target_tablespace1
source_tablespace2,target_tablespace2
Performance Optimization
-
Parallel Jobs
- Control concurrent table copies with
--copy-jobs
cbcopy --copy-jobs=8 [other_options]
- Control concurrent table copies with
-
Copy Strategies cbcopy automatically selects the optimal strategy based on table size:
- Copy On Coordinator: For small tables
- Copy On Segment: For large tables with matching cluster configurations
- Copy on External Table: For other cases
Migration Validation
cbcopy performs automatic validation by comparing row counts between source and target. Failed migrations are logged in:
$USER/gpAdminLogs/cbcopy_succeed_$timestamp
$USER/gpAdminLogs/cbcopy_failed_$timestamp
For retry attempts, use:
cbcopy --exclude-table-file=cbcopy_succeed_$timestamp [other_options]
Example Scenarios
Scenario 1: Cloudberry 1.6 to 2.0 Upgrade
# Step 1: Migrate metadata
cbcopy --source-host=127.0.0.1 \
--source-port=15432 \
--source-user=gpadmin \
--dest-host=127.0.0.1 \
--dest-port=25432 \
--dest-user=gpadmin \
--full \
--metadata-only
# Step 2: Migrate data
cbcopy --source-host=127.0.0.1 \
--source-port=15432 \
--source-user=gpadmin \
--dest-host=127.0.0.1 \
--dest-port=25432 \
--dest-user=gpadmin \
--full \
--data-only \
--truncate
Scenario 2: Greenplum 6.x/7.x to Cloudberry 2.0 Migration
# Step 1: Migrate metadata with global objects
cbcopy --source-host=127.0.0.1 \
--source-port=15432 \
--source-user=gpadmin \
--dest-host=127.0.0.1 \
--dest-port=25432 \
--dest-user=gpadmin \
--full \
--metadata-only \
--with-global-metadata
# Step 2: Migrate data
cbcopy --source-host=127.0.0.1 \
--source-port=15432 \
--source-user=gpadmin \
--dest-host=127.0.0.1 \
--dest-port=25432 \
--dest-user=gpadmin \
--full \
--data-only \
--truncate \
--copy-jobs=8
Troubleshooting
-
Failed Migrations
- Check
$USER/gpAdminLogs/cbcopy_$timestamp.log
for detailed error messages - Use the success/failure files to retry failed migrations
- Check
-
Common Issues
- Ensure all required tablespaces exist in target database
- Verify network connectivity between source and target
- Check user permissions on both databases
- Ensure sufficient disk space on both clusters
Additional Resources
cbcopy is contributed by community members; however, please note that it is not maintained as an official Cloudberry project yet.