xref: /sqlite-3.40.0/ext/fts3/README.syntax (revision 42128b9e)
1d34c03a9Sdanielk1977
2d34c03a9Sdanielk19771. OVERVIEW
3d34c03a9Sdanielk1977
4d34c03a9Sdanielk1977  This README file describes the syntax of the arguments that may be passed to
5d34c03a9Sdanielk1977  the FTS3 MATCH operator used for full-text queries. For example, if table
6d34c03a9Sdanielk1977  "t1" is an Fts3 virtual table, the following SQL query:
7d34c03a9Sdanielk1977
8d34c03a9Sdanielk1977    SELECT * FROM t1 WHERE <col> MATCH <full-text query>
9d34c03a9Sdanielk1977
10d34c03a9Sdanielk1977  may be used to retrieve all rows that match a specified for full-text query.
11d34c03a9Sdanielk1977  The text "<col>" should be replaced by either the name of the fts3 table
12d34c03a9Sdanielk1977  (in this case "t1"), or by the name of one of the columns of the fts3
13d34c03a9Sdanielk1977  table. <full-text-query> should be replaced by an SQL expression that
14d34c03a9Sdanielk1977  computes to a string containing an Fts3 query.
15d34c03a9Sdanielk1977
16d34c03a9Sdanielk1977  If the left-hand-side of the MATCH operator is set to the name of the
17d34c03a9Sdanielk1977  fts3 table, then by default the query may be matched against any column
18d34c03a9Sdanielk1977  of the table. If it is set to a column name, then by default the query
19d34c03a9Sdanielk1977  may only match the specified column. In both cases this may be overriden
20d34c03a9Sdanielk1977  as part of the query text (see sections 2 and 3 below).
21d34c03a9Sdanielk1977
22d34c03a9Sdanielk1977  As of SQLite version 3.6.8, Fts3 supports two slightly different query
23d34c03a9Sdanielk1977  formats; the standard syntax, which is used by default, and the enhanced
24d34c03a9Sdanielk1977  query syntax which can be selected by compiling with the pre-processor
25*42128b9eSdrh  symbol SQLITE_ENABLE_FTS3_PARENTHESIS defined.
26d34c03a9Sdanielk1977
27*42128b9eSdrh    -DSQLITE_ENABLE_FTS3_PARENTHESIS
28d34c03a9Sdanielk1977
29d34c03a9Sdanielk19772. STANDARD QUERY SYNTAX
30d34c03a9Sdanielk1977
31d34c03a9Sdanielk1977  When using the standard Fts3 query syntax, a query usually consists of a
32d34c03a9Sdanielk1977  list of terms (words) separated by white-space characters. To match a
33d34c03a9Sdanielk1977  query, a row (or column) of an Fts3 table must contain each of the specified
34d34c03a9Sdanielk1977  terms. For example, the following query:
35d34c03a9Sdanielk1977
36d34c03a9Sdanielk1977    <col> MATCH 'hello world'
37d34c03a9Sdanielk1977
38d34c03a9Sdanielk1977  matches rows (or columns, if <col> is the name of a column name) that
39d34c03a9Sdanielk1977  contain at least one instance of the token "hello", and at least one
40d34c03a9Sdanielk1977  instance of the token "world". Tokens may be grouped into phrases using
41d34c03a9Sdanielk1977  quotation marks. In this case, a matching row or column must contain each
42d34c03a9Sdanielk1977  of the tokens in the phrase in the order specified, with no intervening
43d34c03a9Sdanielk1977  tokens. For example, the query:
44d34c03a9Sdanielk1977
45d34c03a9Sdanielk1977    <col> MATCH '"hello world" joe"
46d34c03a9Sdanielk1977
47d34c03a9Sdanielk1977  matches the first of the following two documents, but not the second or
48d34c03a9Sdanielk1977  third:
49d34c03a9Sdanielk1977
50d34c03a9Sdanielk1977    "'Hello world', said Joe."
51d34c03a9Sdanielk1977    "One should always greet the world with a cheery hello, thought Joe."
52d34c03a9Sdanielk1977    "How many hello world programs could their be?"
53d34c03a9Sdanielk1977
54d34c03a9Sdanielk1977  As well as grouping tokens together by phrase, the binary NEAR operator
55d34c03a9Sdanielk1977  may be used to search for rows that contain two or more specified tokens
56d34c03a9Sdanielk1977  or phrases within a specified proximity of each other. The NEAR operator
57d34c03a9Sdanielk1977  must always be specified in upper case. The word "near" in lower or mixed
58d34c03a9Sdanielk1977  case is treated as an ordinary token. For example, the following query:
59d34c03a9Sdanielk1977
60d34c03a9Sdanielk1977    <col> MATCH 'engineering NEAR consultancy'
61d34c03a9Sdanielk1977
62d34c03a9Sdanielk1977  matches rows that contain both the "engineering" and "consultancy" tokens
63d34c03a9Sdanielk1977  in the same column with not more than 10 other words between them. It does
64d34c03a9Sdanielk1977  not matter which of the two terms occurs first in the document, only that
65d34c03a9Sdanielk1977  they be seperated by only 10 tokens or less. The user may also specify
66d34c03a9Sdanielk1977  a different required proximity by adding "/N" immediately after the NEAR
67d34c03a9Sdanielk1977  operator, where N is an integer. For example:
68d34c03a9Sdanielk1977
69d34c03a9Sdanielk1977    <col> MATCH 'engineering NEAR/5 consultancy'
70d34c03a9Sdanielk1977
71d34c03a9Sdanielk1977  searches for a row containing an instance of each specified token seperated
72d34c03a9Sdanielk1977  by not more than 5 other tokens. More than one NEAR operator can be used
73d34c03a9Sdanielk1977  in as sequence. For example this query:
74d34c03a9Sdanielk1977
75d34c03a9Sdanielk1977    <col> MATCH 'reliable NEAR/2 engineering NEAR/5 consultancy'
76d34c03a9Sdanielk1977
77d34c03a9Sdanielk1977  searches for a row that contains an instance of the token "reliable"
78d34c03a9Sdanielk1977  seperated by not more than two tokens from an instance of "engineering",
79d34c03a9Sdanielk1977  which is in turn separated by not more than 5 other tokens from an
80d34c03a9Sdanielk1977  instance of the term "consultancy". Phrases enclosed in quotes may
81d34c03a9Sdanielk1977  also be used as arguments to the NEAR operator.
82d34c03a9Sdanielk1977
83d34c03a9Sdanielk1977  Similar to the NEAR operator, one or more tokens or phrases may be
84d34c03a9Sdanielk1977  separated by OR operators. In this case, only one of the specified tokens
85d34c03a9Sdanielk1977  or phrases must appear in the document. For example, the query:
86d34c03a9Sdanielk1977
87d34c03a9Sdanielk1977    <col> MATCH 'hello OR world'
88d34c03a9Sdanielk1977
89d34c03a9Sdanielk1977  matches rows that contain either the term "hello", or the term "world",
90d34c03a9Sdanielk1977  or both. Note that unlike in many programming languages, the OR operator
91d34c03a9Sdanielk1977  has a higher precedence than the AND operators implied between white-space
92d34c03a9Sdanielk1977  separated tokens. The following query matches documents that contain the
93d34c03a9Sdanielk1977  term 'sqlite' and at least one of the terms 'fantastic' or 'impressive',
94d34c03a9Sdanielk1977  not those that contain both 'sqlite' and 'fantastic' or 'impressive':
95d34c03a9Sdanielk1977
96d34c03a9Sdanielk1977    <col> MATCH 'sqlite fantastic OR impressive'
97d34c03a9Sdanielk1977
98d34c03a9Sdanielk1977  Any token that is part of an Fts3 query expression, whether or not it is
99d34c03a9Sdanielk1977  part of a phrase enclosed in quotes, may have a '*' character appended to
100d34c03a9Sdanielk1977  it. In this case, the token matches all terms that begin with the characters
101d34c03a9Sdanielk1977  of the token, not just those that exactly match it. For example, the
102d34c03a9Sdanielk1977  following query:
103d34c03a9Sdanielk1977
104d34c03a9Sdanielk1977    <col> MATCH 'sql*'
105d34c03a9Sdanielk1977
106d34c03a9Sdanielk1977  matches all rows that contain the term "SQLite", as well as those that
107d34c03a9Sdanielk1977  contain "SQL".
108d34c03a9Sdanielk1977
109d34c03a9Sdanielk1977  A token that is not part of a quoted phrase may be preceded by a '-'
110d34c03a9Sdanielk1977  character, which indicates that matching rows must not contain the
111d34c03a9Sdanielk1977  specified term. For example, the following:
112d34c03a9Sdanielk1977
113d34c03a9Sdanielk1977    <col> MATCH '"database engine" -sqlite'
114d34c03a9Sdanielk1977
115d34c03a9Sdanielk1977  matches rows that contain the phrase "database engine" but do not contain
116d34c03a9Sdanielk1977  the term "sqlite". If the '-' character occurs inside a quoted phrase,
117d34c03a9Sdanielk1977  it is ignored. It is possible to use both the '-' prefix and the '*' postfix
118d34c03a9Sdanielk1977  on a single term. At this time, all Fts3 queries must contain at least
119d34c03a9Sdanielk1977  one term or phrase that is not preceded by the '-' prefix.
120d34c03a9Sdanielk1977
121d34c03a9Sdanielk1977  Regardless of whether or not a table name or column name is used on the
122d34c03a9Sdanielk1977  left hand side of the MATCH operator, a specific column of the fts3 table
123d34c03a9Sdanielk1977  may be associated with each token in a query by preceding a token with
124d34c03a9Sdanielk1977  a column name followed by a ':' character. For example, regardless of what
125d34c03a9Sdanielk1977  is specified for <col>, the following query requires that column "col1"
126d34c03a9Sdanielk1977  of the table contains the term "hello", and that column "col2" of the
127d34c03a9Sdanielk1977  table contains the term "world". If the table does not contain columns
128d34c03a9Sdanielk1977  named "col1" and "col2", then an error is returned and the query is
129d34c03a9Sdanielk1977  not run.
130d34c03a9Sdanielk1977
131d34c03a9Sdanielk1977    <col> MATCH 'col1:hello col2:world'
132d34c03a9Sdanielk1977
133d34c03a9Sdanielk1977  It is not possible to associate a specific table column with a quoted
134d34c03a9Sdanielk1977  phrase or a term preceded by a '-' operator. A '*' character may be
135d34c03a9Sdanielk1977  appended to a term associated with a specific column for prefix matching.
136d34c03a9Sdanielk1977
137d34c03a9Sdanielk19773. ENHANCED QUERY SYNTAX
138d34c03a9Sdanielk1977
139d34c03a9Sdanielk1977  The enhanced query syntax is quite similar to the standard query syntax,
140d34c03a9Sdanielk1977  with the following four differences:
141d34c03a9Sdanielk1977
142d34c03a9Sdanielk1977  1) Parenthesis are supported. When using the enhanced query syntax,
143d34c03a9Sdanielk1977     parenthesis may be used to overcome the built-in precedence of the
144d34c03a9Sdanielk1977     supplied binary operators. For example, the following query:
145d34c03a9Sdanielk1977
146d34c03a9Sdanielk1977       <col> MATCH '(hello world) OR (simple example)'
147d34c03a9Sdanielk1977
148d34c03a9Sdanielk1977     matches documents that contain both "hello" and "world", and documents
149d34c03a9Sdanielk1977     that contain both "simple" and "example". It is not possible to forumlate
150d34c03a9Sdanielk1977     such a query using the standard syntax.
151d34c03a9Sdanielk1977
152d34c03a9Sdanielk1977  2) Instead of separating tokens and phrases by whitespace, an AND operator
153d34c03a9Sdanielk1977     may be explicitly specified. This does not change query processing at
154d34c03a9Sdanielk1977     all, but may be used to improve readability. For example, the following
155d34c03a9Sdanielk1977     query is handled identically to the one above:
156d34c03a9Sdanielk1977
157d34c03a9Sdanielk1977       <col> MATCH '(hello AND world) OR (simple AND example)'
158d34c03a9Sdanielk1977
159d34c03a9Sdanielk1977     As with the OR and NEAR operators, the AND operator must be specified
160d34c03a9Sdanielk1977     in upper case. The word "and" specified in lower or mixed case is
161d34c03a9Sdanielk1977     handled as a regular token.
162d34c03a9Sdanielk1977
163d34c03a9Sdanielk1977  3) The '-' token prefix is not supported. Instead, a new binary operator,
164d34c03a9Sdanielk1977     NOT, is included. The NOT operator requires that the query specified
165d34c03a9Sdanielk1977     as its left-hand operator matches, but that the query specified as the
166d34c03a9Sdanielk1977     right-hand operator does not. For example, to query for all rows that
167d34c03a9Sdanielk1977     contain the term "example" but not the term "simple", the following
168d34c03a9Sdanielk1977     query could be used:
169d34c03a9Sdanielk1977
170d34c03a9Sdanielk1977       <col> MATCH 'example NOT simple'
171d34c03a9Sdanielk1977
172d34c03a9Sdanielk1977     As for all other operators, the NOT operator must be specified in
173d34c03a9Sdanielk1977     upper case. Otherwise it will be treated as a regular token.
174d34c03a9Sdanielk1977
175d34c03a9Sdanielk1977  4) Unlike in the standard syntax, where the OR operator has a higher
176d34c03a9Sdanielk1977     precedence than the implicit AND operator, when using the enhanced
177d34c03a9Sdanielk1977     syntax implicit and explict AND operators have a higher precedence
178d34c03a9Sdanielk1977     than OR operators. Using the enhanced syntax, the following two
179d34c03a9Sdanielk1977     queries are equivalent:
180d34c03a9Sdanielk1977
181d34c03a9Sdanielk1977       <col> MATCH 'sqlite fantastic OR impressive'
182d34c03a9Sdanielk1977       <col> MATCH '(sqlite AND fantastic) OR impressive'
183d34c03a9Sdanielk1977
184d34c03a9Sdanielk1977     however, when using the standard syntax, the query:
185d34c03a9Sdanielk1977
186d34c03a9Sdanielk1977       <col> MATCH 'sqlite fantastic OR impressive'
187d34c03a9Sdanielk1977
188d34c03a9Sdanielk1977     is equivalent to the enhanced syntax query:
189d34c03a9Sdanielk1977
190d34c03a9Sdanielk1977       <col> MATCH 'sqlite AND (fantastic OR impressive)'
191d34c03a9Sdanielk1977
192d34c03a9Sdanielk1977     The precedence of all enhanced syntax operators, in order from highest
193d34c03a9Sdanielk1977     to lowest, is:
194d34c03a9Sdanielk1977
195d34c03a9Sdanielk1977       NEAR       (highest precedence, tightest grouping)
196d34c03a9Sdanielk1977       NOT
197d34c03a9Sdanielk1977       AND
198d34c03a9Sdanielk1977       OR         (lowest precedence, loosest grouping)
199d34c03a9Sdanielk1977
200d34c03a9Sdanielk1977  Using the advanced syntax, it is possible to specify expressions enclosed
201d34c03a9Sdanielk1977  in parenthesis as operands to the NOT, AND and OR operators. However both
202d34c03a9Sdanielk1977  the left and right hand side operands of NEAR operators must be either
203d34c03a9Sdanielk1977  tokens or phrases. Attempting the following query will return an error:
204d34c03a9Sdanielk1977
205d34c03a9Sdanielk1977    <col> MATCH 'sqlite NEAR (fantastic OR impressive)'
206d34c03a9Sdanielk1977
207d34c03a9Sdanielk1977  Queries of this form must be re-written as:
208d34c03a9Sdanielk1977
209d34c03a9Sdanielk1977    <col> MATCH 'sqlite NEAR fantastic OR sqlite NEAR impressive'
210