Deploy Apache Cloudberry Manually Using RPM Package
This document introduces how to manually deploy Apache Cloudberry on physical/virtual machines using RPM package. Before reading this document, it is recommended to first read the Software and Hardware Configuration Requirements and Prepare to Deploy Apache Cloudberry.
The deployment method in this document is for production environments.
The example in this document uses CentOS 7.6 and deploys Apache Cloudberry v1.0.0. The main steps are as follows:
- Prepare node servers.
- Install the RPM package.
- Configure mutual trust between nodes.
- Initialize the database.
- Log into the database.
Step 1: Prepare server nodes
Read the Prepare to Deploy Apache Cloudberry document to prepare the server nodes.
Step 2. Install the RPM package
After the preparation, it is time to install Apache Cloudberry. You need to download the corresponding RPM package from Apache Cloudberry Releases, and then install the database on each node using the installation package.
-
Download the RPM package to the home directory of
gpadmin
.wget -P /home/gpadmin <download address>
-
Install the RPM package in the
/home/gpadmin
directory.When running the following command, you need to replace
<RPM package path>
with the actual RPM package path, as theroot
user. During the installation, the directory/usr/local/cloudberry-db/
is automatically created.cd /home/gpadmin
yum install <RPM package path> -
Grant the
gpadmin
user the permission to access the/usr/local/cloudberry-db/
directory.chown -R gpadmin:gpadmin /usr/local
chown -R gpadmin:gpadmin /usr/local/cloudberry*
Step 3. Configure mutual trust between nodes
-
Switch to the
gpadmin
user, and use thegpadmin
user for subsequent operations. -
Create a configuration file for node information.
Create the node configuration file in the
/home/gpadmin/
directory, including theall_hosts
andseg_hosts
files, which store the host information of all nodes and data nodes respectively. The example node information is as follows:[gpadmin@cbdb-coordinator gpadmin]$ cat all_hosts
cbdb-coordinator
cbdb-standbycoordinator
cbdb-datanode01
cbdb-datanode02
cbdb-datanode03
[gpadmin@cbdb-coordinator gpadmin]$ cat seg_hosts
cbdb-datanode01
cbdb-datanode02
cbdb-datanode03 -
Configure SSH trust between hosts.
-
Run
ssh-keygen
on each host to generate SSH key. For example:[gpadmin@cbbd-coordinator cloudberry-db-1.0.0]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/usr/local/cloudberry-db/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /usr/local/cloudberry-db/.ssh/id_rsa.
Your public key has been saved in /usr/local/cloudberry-db/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:cvcYS87egYCyh/v6UtdqrejVU5qqF7OvpcHg/T9lRrg gpadmin@cbbd-coordinator
The key's randomart image is:
+---[RSA 2048]----+
| |
| |
| + |
|+ O |
|o ... S |
|. +o= B C |
| o B=00 D |
|.o=o0o.. = |
|O=++*+o+.. |
+----[SHA256]-----+ -
Run
ssh-copy-id
on each host to configure password-free login. The example is as follows:ssh-copy-id cbdb-coordinator
ssh-copy-id cbdb-standbycoordinator
ssh-copy-id cbdb-datanode01
ssh-copy-id cbdb-datanode02
ssh-copy-id cbdb-datanode03 -
Verify that SSH between nodes is all connected, that is, the password-free login between servers is successful. The example is as follows:
[gpadmin@cbdb-coordinator ~]$ gpssh -f all_hosts
=> pwd
[ cbdb-datanode03] b'/usr/local/cloudberry-db\r'
[ cbdb-coordinator] b'/usr/local/cloudberry-db\r'
[ cbdb-datanode02] b'/usr/local/cloudberry-db\r'
[cbdb-standbycoordinator] b'/usr/local/cloudberry-db\r'
[ cbdb-datanode01] b'/usr/local/cloudberry-db\r'
=>If you fail to run
gpssh
, you can first runsource /usr/local/cloudberry-db/greenplum_path.sh
on the coordinator node.
-
Step 4. Initialize Apache Cloudberry
Before performing the following operations, run su - gpadmin
to switch to the gpadmin
user.
-
Add a new line of
source
command to the~/.bashrc
files of all nodes (coordinator/standby coordinator/segment). The example is as follows:source /usr/local/cloudberry-db/greenplum_path.sh
-
Run the
source
command to make the newly added content effective:source ~/.bashrc
-
Use the
gpssh
command on the coordinator node to create data directories and mirror directories for segment nodes. In this document, the 2 directories are/data0/primary/
and/data0/mirror/
, respectively. The example is as follows:gpssh -f seg_hosts
mkdir -p /data0/primary/
mkdir -p /data0/mirror/ -
Create data directory on the coordinator node. In this document, the directory is
/data0/coordinator/
.mkdir -p /data0/coordinator/
-
Use the
gpssh
command on the coordinator node to create data directory for the standby node. In this document, the directory is/data0/coordinator/
.gpssh -h cbdb-standbycoordinator -e 'mkdir -p /data0/coordinator/'
-
On the hosts of the coordinator and standby nodes, add a line to the
~/.bashrc
file to declare the path ofCOORDINATOR_DATA_DIRECTORY
, which is{the path step 5}
+gpseg-1
. For example:export COORDINATOR_DATA_DIRECTORY=/data0/coordinator/gpseg-1
-
Run the following command on the hosts of the coordinator and standby nodes to make the declaration of
COORDINATOR_DATA_DIRECTORY
in the previous step effective.source ~/.bashrc
-
Configure the
gpinitsystem_config
initialization script:-
On the host where the coordinator node is located, copy the template configuration file to the current directory:
cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config .
-
Modify the
gpinitsystem_config
file as follows:-
Pay attention to the port, coordinator node, segment node, and mirror node.
-
Modify
DATA_DIRECTORY
to the data directory of the segment node, for example,/data0/primary
. -
Modify
COORDINATOR_HOSTNAME
to the hostname of the coordinator node, for example,cbdb-coordinator
. -
Modify
COORDINATOR_DIRECTORY
to the data directory of the coordinator node, for example,/data0/coordinator
. -
Modify
MIRROR_DATA_DIRECTORY
to the data directory of the mirror node, for example,/data0/mirror
.[gpadmin@cbdb-coordinator ~]$ cat gpinitsystem_config
# FILE NAME: gpinitsystem_config
# Configuration file needed by the gpinitsystem
########################################
#### REQUIRED PARAMETERS
########################################
#### Naming convention for utility-generated data directories.
SEG_PREFIX=gpseg
#### Base number by which primary segment port numbers
#### are calculated.
PORT_BASE=6000
#### File system location(s) where primary segment data directories
#### will be created. The number of locations in the list dictate
#### the number of primary segments that will get created per
#### physical host (if multiple addresses for a host are listed in
#### the hostfile, the number of segments will be spread evenly across
#### the specified interface addresses).
declare -a DATA_DIRECTORY=(/data0/primary)
#### OS-configured hostname or IP address of the coordinator host.
COORDINATOR_HOSTNAME=cbdb-coordinator
#### File system location where the coordinator data directory
#### will be created.
COORDINATOR_DIRECTORY=/data0/coordinator
#### Port number for the coordinator instance.
COORDINATOR_PORT=5432
#### Shell utility used to connect to remote hosts.
TRUSTED_SHELL=ssh
#### Default server-side character set encoding.
ENCODING=UNICODE
########################################
#### OPTIONAL MIRROR PARAMETERS
########################################
#### Base number by which mirror segment port numbers
#### are calculated.
MIRROR_PORT_BASE=7000
#### File system location(s) where mirror segment data directories
#### will be created. The number of mirror locations must equal the
#### number of primary locations as specified in the
#### DATA_DIRECTORY parameter.
declare -a MIRROR_DATA_DIRECTORY=(/data0/mirror) -
To create a default database during initialization, you need to fill in the database name. In this example, the
warehouse
database is created during initialization########################################
#### OTHER OPTIONAL PARAMETERS
########################################
#### Create a database of this name after initialization.
DATABASE_NAME=warehouse
-
-
-
Use
gpinitsystem
to initialize Apache Cloudberry. For example:gpinitsystem -c gpinitsystem_config -h /home/gpadmin/seg_hosts
In the command above,
-c
specifies the configuration file and-h
specifies the computing node list.If you need to initialize the standby coordinator node, refer to the following command:
gpinitstandby -s cbdb-standbycoordinator
Step 5. Log into Apache Cloudberry
Now you have successfully deployed Apache Cloudberry. To log into the database, refer to the following command:
psql -h <hostname> -p <port> -U <username> -d <database>
In the command above:
<hostname>
is the IP address of the coordinator node of the Apache Cloudberry server.<port>
is the default port number of Apache Cloudberry, which is5432
by default.<username>
is the user name of the database.<database>
is the name of the database to connect.
After you run the psql
command, the system will prompt you to enter the database password. After you enter the correct password, you will successfully log into Apache Cloudberry and can perform SQL queries and operations. Make sure that you have the correct permissions to access the target database.
[gpadmin@cddb-coordinator ~]$ psql warehouse
psql (14.4, server 14.4)
Type "help" for help.
warehouse=# SELECT * FROM gp_segment_configuration;
dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir
------------------------------------------------------------------------------------------
1 | -1 | p | p | n | u | 5432 | cddb-coordinator | cddb-coordinator | /data0/coordinator/gpseg-1
8 | -1 | m | m | s | u | 5432 | cddb-standbycoordinator | cddb-standbycoordinator | /data0/coordinator/gpseg-1
2 | 0 | p | p | s | u | 6000 | cddb-datanode01 | cddb-datanode01 | /data0/primary/gpseg0
5 | 0 | m | m | s | u | 7000 | cddb-datanode02 | cddb-datanode02 | /data0/mirror/gpseg0
3 | 1 | p | p | s | u | 6000 | cddb-datanode02 | cddb-datanode02 | /data0/primary/gpseg1
6 | 1 | m | m | s | u | 7000 | cddb-datanode03 | cddb-datanode03 | /data0/mirror/gpseg1
4 | 2 | p | p | s | u | 6000 | cddb-datanode03 | cddb-datanode03 | /data0/primary/gpseg2
7 | 2 | m | m | s | u | 7000 | cddb-datanode01 | cddb-datanode01 | /data0/mirror/gpseg2
(8 rows)