Version: 2.x

Load Data into Apache Cloudberry Using `gpload`

The gpload utility of Apache Cloudberry loads data using readable external tables and the Apache Cloudberry parallel file server (gpfdist). It handles parallel file-based external table setup and allows users to configure their data format, external table definition, and gpfdist setup in a single configuration file.

tip

In gpload, MERGE and UPDATE operations are not supported if the target table column name is a reserved keyword, has capital letters, or includes any character that requires quotes " " to identify the column.

To use gpload

Ensure that your environment is set up to run gpload. Some dependent files from your Apache Cloudberry installation are required, such as gpfdist and Python 3, as well as network access to the Apache Cloudberry segment hosts. gpload also requires that you install the following packages:
```
pip install psycopg2 pyyaml
```

Create your load control file. This is a YAML-formatted file that specifies the Apache Cloudberry connection information, gpfdist configuration information, external table options, and data format.

For example:

---
VERSION: 1.0.0.1
DATABASE: ops
USER: gpadmin
HOST: cdw-1
PORT: 5432
GPLOAD:
   INPUT:
    - SOURCE:
         LOCAL_HOSTNAME:
           - etl1-1
           - etl1-2
           - etl1-3
           - etl1-4
         PORT: 8081
         FILE: 
           - /var/load/data/*
    - COLUMNS:
           - name: text
           - amount: float4
           - category: text
           - descr: text
           - date: date
    - FORMAT: text
    - DELIMITER: '|'
    - ERROR_LIMIT: 25
    - LOG_ERRORS: true
   OUTPUT:
    - TABLE: payables.expenses
    - MODE: INSERT
   PRELOAD:
    - REUSE_TABLES: true 
# SQL:
#   - BEFORE: "INSERT INTO audit VALUES('start', current_timestamp)"
#   - AFTER: "INSERT INTO audit VALUES('end', current_timestamp)"

Run gpload, passing in the load control file. For example:
```
gpload -f my_load.yml
```

Load Data into Apache Cloudberry Using gpload

To use gpload​

Load Data into Apache Cloudberry Using `gpload`

To use gpload