NGram

NGram( terms, min, max )
NGram( terms, min, max )
NGram( terms, min, max )
NGram( terms, min, max )

Description

The NGram function tokenizes the input terms into n-grams of the given size(s). Any non-string terms are returned unmodified.

Parameters

Argument Type Definition and Requirements

terms

String

The string to break into small pieces.

min

Integer

Optional - The minimum size of a ngram. Default is 1.

max

Integer

Optional - The maximum size of a ngram. Default is 2.

Attempts to provide negative values of min or max, or which differ by more than 1, result in an "invalid argument" error.

Attempts to provide a value for max less than min, or conversely min greater than max, result in an "invalid argument" error.

Returns

If only a single value is returned, then NGram returns a single string. If more than one value is returned, then NGram returns an array of strings.

Examples

The query below executes two NGram operations. The result array position matches the execution array position. Both operations take the term "football". The first returns the n-grams produced with the defaults sizes, and the second returns the n-grams produced with a minimum size of 3 and a maximum size of 4.

System.out.println(
   client.query(
     Arr(
       NGram("Football"),
       NGram("Football",3,4)
     )
   ).get());
[
    ["F", "Fo", "o", "oo", "o", "ot", "t", "tb", "b", "ba", "a", "al", "l", "ll", "l"],
    ["Foo", "Foot", "oot", "ootb", "otb", "otba", "tba", "tbal", "bal", "ball", "all"]
   ]