Skip to main content
Version: Next

Text Search Configuration Example

This document shows how to create a customized text search configuration to process document and query text.

A text search configuration specifies all options necessary to transform a document into a tsvector: the parser to use to break text into tokens, and the dictionaries to use to transform each token into a lexeme. Every call of to_tsvector or to_tsquery needs a text search configuration to perform its processing. The configuration parameter default_text_search_config specifies the name of the default configuration, which is the one used by text search functions if an explicit configuration parameter is omitted. It can be set in postgresql.conf using the gpconfig command-line utility, or set for an individual session using the SET command.

Several predefined text search configurations are available, and you can create custom configurations easily. To facilitate management of text search objects, a set of SQL commands is available, and there are several psql commands that display information about text search objects (psql Support).

As an example, you can create a configuration pg, starting by duplicating the built-in english configuration:

CREATE TEXT SEARCH CONFIGURATION public.pg ( COPY = pg_catalog.english );

You can use a PostgreSQL-specific synonym list and store it in $SHAREDIR/tsearch_data/pg_dict.syn. The file contents look like:

postgres    pg
pgsql pg
postgresql pg

Define the synonym dictionary like this:

CREATE TEXT SEARCH DICTIONARY pg_dict (
TEMPLATE = synonym,
SYNONYMS = pg_dict
);

Next, register the Ispell dictionary english_ispell, which has its own configuration files:

CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = ispell,
DictFile = english,
AffFile = english,
StopWords = english
);

Set up the mappings for words in configuration pg:

ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH pg_dict, english_ispell, english_stem;

Choose not to index or search some token types that the built-in configuration does handle:

ALTER TEXT SEARCH CONFIGURATION pg
DROP MAPPING FOR email, url, url_path, sfloat, float;

Now test the configuration:

SELECT * FROM ts_debug('public.pg', '
PostgreSQL, the highly scalable, SQL compliant, open source object-relational
database management system, is now undergoing beta testing of the next
version of our software.
');

The next step is to set the session to use the new configuration, which was created in the public schema:

SET text_search_config = 'public.pg';
=> \dF
List of text search configurations
Schema | Name | Description
---------+------+-------------
public | pg |

SET default_text_search_config = 'public.pg';
SET

SHOW default_text_search_config;
default_text_search_config
----------------------------
public.pg