Skip to main content

Deploy Apache Cloudberry Manually Using RPM Package

This document introduces how to manually deploy Apache Cloudberry on physical/virtual machines using RPM package. Before reading this document, it is recommended to first read the Software and Hardware Configuration Requirements and Prepare to Deploy Apache Cloudberry.

The deployment method in this document is for production environments.

The example in this document uses CentOS 7.6 and deploys Apache Cloudberry v1.0.0. The main steps are as follows:

  1. Prepare node servers.
  2. Install the RPM package.
  3. Configure mutual trust between nodes.
  4. Initialize the database.
  5. Log into the database.

Step 1: Prepare server nodes

Read the Prepare to Deploy Apache Cloudberry document to prepare the server nodes.

Step 2. Install the RPM package

After the preparation, it is time to install Apache Cloudberry. You need to download the corresponding RPM package from Apache Cloudberry Releases, and then install the database on each node using the installation package.

  1. Download the RPM package to the home directory of gpadmin.

    wget -P /home/gpadmin <download address>
  2. Install the RPM package in the /home/gpadmin directory.

    When running the following command, you need to replace <RPM package path> with the actual RPM package path, as the root user. During the installation, the directory /usr/local/cloudberry-db/ is automatically created.

    cd /home/gpadmin
    yum install <RPM package path>
  3. Grant the gpadmin user the permission to access the /usr/local/cloudberry-db/ directory.

    chown -R gpadmin:gpadmin /usr/local
    chown -R gpadmin:gpadmin /usr/local/cloudberry*

Step 3. Configure mutual trust between nodes

  1. Switch to the gpadmin user, and use the gpadmin user for subsequent operations.

  2. Create a configuration file for node information.

    Create the node configuration file in the /home/gpadmin/ directory, including the all_hosts and seg_hosts files, which store the host information of all nodes and data nodes respectively. The example node information is as follows:

    [gpadmin@cbdb-coordinator gpadmin]$ cat all_hosts

    cbdb-coordinator
    cbdb-standbycoordinator
    cbdb-datanode01
    cbdb-datanode02
    cbdb-datanode03

    [gpadmin@cbdb-coordinator gpadmin]$ cat seg_hosts

    cbdb-datanode01
    cbdb-datanode02
    cbdb-datanode03
  3. Configure SSH trust between hosts.

    1. Run ssh-keygen on each host to generate SSH key. For example:

      [gpadmin@cbbd-coordinator cloudberry-db-1.0.0]$ ssh-keygen

      Generating public/private rsa key pair.
      Enter file in which to save the key (/usr/local/cloudberry-db/.ssh/id_rsa):
      Enter passphrase (empty for no passphrase):
      Enter same passphrase again:
      Your identification has been saved in /usr/local/cloudberry-db/.ssh/id_rsa.
      Your public key has been saved in /usr/local/cloudberry-db/.ssh/id_rsa.pub.
      The key fingerprint is:
      SHA256:cvcYS87egYCyh/v6UtdqrejVU5qqF7OvpcHg/T9lRrg gpadmin@cbbd-coordinator
      The key's randomart image is:
      +---[RSA 2048]----+
      | |
      | |
      | + |
      |+ O |
      |o ... S |
      |. +o= B C |
      | o B=00 D |
      |.o=o0o.. = |
      |O=++*+o+.. |
      +----[SHA256]-----+
    2. Run ssh-copy-id on each host to configure password-free login. The example is as follows:

      ssh-copy-id  cbdb-coordinator
      ssh-copy-id cbdb-standbycoordinator
      ssh-copy-id cbdb-datanode01
      ssh-copy-id cbdb-datanode02
      ssh-copy-id cbdb-datanode03
    3. Verify that SSH between nodes is all connected, that is, the password-free login between servers is successful. The example is as follows:

      [gpadmin@cbdb-coordinator ~]$ gpssh -f all_hosts
      => pwd
      [ cbdb-datanode03] b'/usr/local/cloudberry-db\r'
      [ cbdb-coordinator] b'/usr/local/cloudberry-db\r'
      [ cbdb-datanode02] b'/usr/local/cloudberry-db\r'
      [cbdb-standbycoordinator] b'/usr/local/cloudberry-db\r'
      [ cbdb-datanode01] b'/usr/local/cloudberry-db\r'
      =>

      If you fail to run gpssh, you can first run source /usr/local/cloudberry-db/greenplum_path.sh on the coordinator node.

Step 4. Initialize Apache Cloudberry

Before performing the following operations, run su - gpadmin to switch to the gpadmin user.

  1. Add a new line of source command to the ~/.bashrc files of all nodes (coordinator/standby coordinator/segment). The example is as follows:

    source /usr/local/cloudberry-db/greenplum_path.sh
  2. Run the source command to make the newly added content effective:

    source ~/.bashrc
  3. Use the gpssh command on the coordinator node to create data directories and mirror directories for segment nodes. In this document, the 2 directories are /data0/primary/ and /data0/mirror/, respectively. The example is as follows:

    gpssh -f seg_hosts
    mkdir -p /data0/primary/
    mkdir -p /data0/mirror/
  4. Create data directory on the coordinator node. In this document, the directory is /data0/coordinator/.

    mkdir -p /data0/coordinator/
  5. Use the gpssh command on the coordinator node to create data directory for the standby node. In this document, the directory is /data0/coordinator/.

    gpssh -h cbdb-standbycoordinator -e 'mkdir -p /data0/coordinator/'
  6. On the hosts of the coordinator and standby nodes, add a line to the ~/.bashrc file to declare the path of COORDINATOR_DATA_DIRECTORY, which is {the path step 5} + gpseg-1. For example:

    export COORDINATOR_DATA_DIRECTORY=/data0/coordinator/gpseg-1
  7. Run the following command on the hosts of the coordinator and standby nodes to make the declaration of COORDINATOR_DATA_DIRECTORY in the previous step effective.

    source ~/.bashrc
  8. Configure the gpinitsystem_config initialization script:

    1. On the host where the coordinator node is located, copy the template configuration file to the current directory:

      cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config .
    2. Modify the gpinitsystem_config file as follows:

      • Pay attention to the port, coordinator node, segment node, and mirror node.

      • Modify DATA_DIRECTORY to the data directory of the segment node, for example, /data0/primary.

      • Modify COORDINATOR_HOSTNAME to the hostname of the coordinator node, for example, cbdb-coordinator.

      • Modify COORDINATOR_DIRECTORY to the data directory of the coordinator node, for example, /data0/coordinator.

      • Modify MIRROR_DATA_DIRECTORY to the data directory of the mirror node, for example, /data0/mirror.

        [gpadmin@cbdb-coordinator ~]$ cat gpinitsystem_config
        # FILE NAME: gpinitsystem_config

        # Configuration file needed by the gpinitsystem

        ########################################
        #### REQUIRED PARAMETERS
        ########################################

        #### Naming convention for utility-generated data directories.
        SEG_PREFIX=gpseg

        #### Base number by which primary segment port numbers
        #### are calculated.
        PORT_BASE=6000

        #### File system location(s) where primary segment data directories
        #### will be created. The number of locations in the list dictate
        #### the number of primary segments that will get created per
        #### physical host (if multiple addresses for a host are listed in
        #### the hostfile, the number of segments will be spread evenly across
        #### the specified interface addresses).
        declare -a DATA_DIRECTORY=(/data0/primary)

        #### OS-configured hostname or IP address of the coordinator host.
        COORDINATOR_HOSTNAME=cbdb-coordinator

        #### File system location where the coordinator data directory
        #### will be created.
        COORDINATOR_DIRECTORY=/data0/coordinator

        #### Port number for the coordinator instance.
        COORDINATOR_PORT=5432

        #### Shell utility used to connect to remote hosts.
        TRUSTED_SHELL=ssh

        #### Default server-side character set encoding.
        ENCODING=UNICODE

        ########################################
        #### OPTIONAL MIRROR PARAMETERS
        ########################################

        #### Base number by which mirror segment port numbers
        #### are calculated.
        MIRROR_PORT_BASE=7000

        #### File system location(s) where mirror segment data directories
        #### will be created. The number of mirror locations must equal the
        #### number of primary locations as specified in the
        #### DATA_DIRECTORY parameter.
        declare -a MIRROR_DATA_DIRECTORY=(/data0/mirror)
      • To create a default database during initialization, you need to fill in the database name. In this example, the warehouse database is created during initialization

        ########################################
        #### OTHER OPTIONAL PARAMETERS
        ########################################

        #### Create a database of this name after initialization.
        DATABASE_NAME=warehouse
  9. Use gpinitsystem to initialize Apache Cloudberry. For example:

    gpinitsystem -c  gpinitsystem_config -h /home/gpadmin/seg_hosts

    In the command above, -c specifies the configuration file and -h specifies the computing node list.

    If you need to initialize the standby coordinator node, refer to the following command:

    gpinitstandby -s cbdb-standbycoordinator

Step 5. Log into Apache Cloudberry

Now you have successfully deployed Apache Cloudberry. To log into the database, refer to the following command:

psql -h <hostname> -p <port> -U <username> -d <database>

In the command above:

  • <hostname> is the IP address of the coordinator node of the Apache Cloudberry server.
  • <port> is the default port number of Apache Cloudberry, which is 5432 by default.
  • <username> is the user name of the database.
  • <database> is the name of the database to connect.

After you run the psql command, the system will prompt you to enter the database password. After you enter the correct password, you will successfully log into Apache Cloudberry and can perform SQL queries and operations. Make sure that you have the correct permissions to access the target database.

[gpadmin@cddb-coordinator ~]$ psql warehouse
psql (14.4, server 14.4)
Type "help" for help.

warehouse=# SELECT * FROM gp_segment_configuration;
dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir
------------------------------------------------------------------------------------------
1 | -1 | p | p | n | u | 5432 | cddb-coordinator | cddb-coordinator | /data0/coordinator/gpseg-1
8 | -1 | m | m | s | u | 5432 | cddb-standbycoordinator | cddb-standbycoordinator | /data0/coordinator/gpseg-1
2 | 0 | p | p | s | u | 6000 | cddb-datanode01 | cddb-datanode01 | /data0/primary/gpseg0
5 | 0 | m | m | s | u | 7000 | cddb-datanode02 | cddb-datanode02 | /data0/mirror/gpseg0
3 | 1 | p | p | s | u | 6000 | cddb-datanode02 | cddb-datanode02 | /data0/primary/gpseg1
6 | 1 | m | m | s | u | 7000 | cddb-datanode03 | cddb-datanode03 | /data0/mirror/gpseg1
4 | 2 | p | p | s | u | 6000 | cddb-datanode03 | cddb-datanode03 | /data0/primary/gpseg2
7 | 2 | m | m | s | u | 7000 | cddb-datanode01 | cddb-datanode01 | /data0/mirror/gpseg2
(8 rows)