Name Date Size #Lines LOC

..04-Jul-2025-

tool/H04-Jul-2025-893734

unicode/H04-Jul-2025-26,84526,636

README.contentH A D04-Jul-20258 KiB177140

README.syntaxH A D04-Jul-20259.3 KiB210152

README.tokenizersH A D04-Jul-20255.2 KiB136107

README.txtH A D04-Jul-2025219 54

fts3.cH A D04-Jul-2025201.3 KiB6,1173,778

fts3.hH A D04-Jul-2025705 278

fts3Int.hH A D04-Jul-202525 KiB655348

fts3_aux.cH A D04-Jul-202516.9 KiB557404

fts3_expr.cH A D04-Jul-202540.5 KiB1,294846

fts3_hash.cH A D04-Jul-202511.3 KiB384258

fts3_hash.hH A D04-Jul-20254 KiB11339

fts3_icu.cH A D04-Jul-20256.6 KiB263189

fts3_porter.cH A D04-Jul-202517.1 KiB663433

fts3_snippet.cH A D04-Jul-202556.5 KiB1,7521,176

fts3_term.cH A D04-Jul-202511.2 KiB375258

fts3_test.cH A D04-Jul-202517 KiB624473

fts3_tokenize_vtab.cH A D04-Jul-202513.2 KiB459300

fts3_tokenizer.cH A D04-Jul-202514.3 KiB521349

fts3_tokenizer.hH A D04-Jul-20256.3 KiB16238

fts3_tokenizer1.cH A D04-Jul-20256.5 KiB235145

fts3_unicode.cH A D04-Jul-202512.4 KiB398277

fts3_unicode2.cH A D04-Jul-202517.5 KiB384292

fts3_write.cH A D04-Jul-2025192.6 KiB5,8153,875

fts3speed.tclH A D04-Jul-20252.7 KiB12380

README.content

1
2FTS4 CONTENT OPTION
3
4  Normally, in order to create a full-text index on a dataset, the FTS4
5  module stores a copy of all indexed documents in a specially created
6  database table.
7
8  As of SQLite version 3.7.9, FTS4 supports a new option - "content" -
9  designed to extend FTS4 to support the creation of full-text indexes where:
10
11    * The indexed documents are not stored within the SQLite database
12      at all (a "contentless" FTS4 table), or
13
14    * The indexed documents are stored in a database table created and
15      managed by the user (an "external content" FTS4 table).
16
17  Because the indexed documents themselves are usually much larger than
18  the full-text index, the content option can sometimes be used to achieve
19  significant space savings.
20
21CONTENTLESS FTS4 TABLES
22
23  In order to create an FTS4 table that does not store a copy of the indexed
24  documents at all, the content option should be set to an empty string.
25  For example, the following SQL creates such an FTS4 table with three
26  columns - "a", "b", and "c":
27
28    CREATE VIRTUAL TABLE t1 USING fts4(content="", a, b, c);
29
30  Data can be inserted into such an FTS4 table using an INSERT statements.
31  However, unlike ordinary FTS4 tables, the user must supply an explicit
32  integer docid value. For example:
33
34    -- This statement is Ok:
35    INSERT INTO t1(docid, a, b, c) VALUES(1, 'a b c', 'd e f', 'g h i');
36
37    -- This statement causes an error, as no docid value has been provided:
38    INSERT INTO t1(a, b, c) VALUES('j k l', 'm n o', 'p q r');
39
40  It is not possible to UPDATE or DELETE a row stored in a contentless FTS4
41  table. Attempting to do so is an error.
42
43  Contentless FTS4 tables also support SELECT statements. However, it is
44  an error to attempt to retrieve the value of any table column other than
45  the docid column. The auxiliary function matchinfo() may be used, but
46  snippet() and offsets() may not. For example:
47
48    -- The following statements are Ok:
49    SELECT docid FROM t1 WHERE t1 MATCH 'xxx';
50    SELECT docid FROM t1 WHERE a MATCH 'xxx';
51    SELECT matchinfo(t1) FROM t1 WHERE t1 MATCH 'xxx';
52
53    -- The following statements all cause errors, as the value of columns
54    -- other than docid are required to evaluate them.
55    SELECT * FROM t1;
56    SELECT a, b FROM t1 WHERE t1 MATCH 'xxx';
57    SELECT docid FROM t1 WHERE a LIKE 'xxx%';
58    SELECT snippet(t1) FROM t1 WHERE t1 MATCH 'xxx';
59
60  Errors related to attempting to retrieve column values other than docid
61  are runtime errors that occur within sqlite3_step(). In some cases, for
62  example if the MATCH expression in a SELECT query matches zero rows, there
63  may be no error at all even if a statement does refer to column values
64  other than docid.
65
66EXTERNAL CONTENT FTS4 TABLES
67
68  An "external content" FTS4 table is similar to a contentless table, except
69  that if evaluation of a query requires the value of a column other than
70  docid, FTS4 attempts to retrieve that value from a table (or view, or
71  virtual table) nominated by the user (hereafter referred to as the "content
72  table"). The FTS4 module never writes to the content table, and writing
73  to the content table does not affect the full-text index. It is the
74  responsibility of the user to ensure that the content table and the
75  full-text index are consistent.
76
77  An external content FTS4 table is created by setting the content option
78  to the name of a table (or view, or virtual table) that may be queried by
79  FTS4 to retrieve column values when required. If the nominated table does
80  not exist, then an external content table behaves in the same way as
81  a contentless table. For example:
82
83    CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c);
84    CREATE VIRTUAL TABLE t3 USING fts4(content="t2", a, c);
85
86  Assuming the nominated table does exist, then its columns must be the same
87  as or a superset of those defined for the FTS table.
88
89  When a users query on the FTS table requires a column value other than
90  docid, FTS attempts to read this value from the corresponding column of
91  the row in the content table with a rowid value equal to the current FTS
92  docid. Or, if such a row cannot be found in the content table, a NULL
93  value is used instead. For example:
94
95    CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c, d);
96    CREATE VIRTUAL TABLE t3 USING fts4(content="t2", b, c);
97
98    INSERT INTO t2 VALUES(2, 'a b', 'c d', 'e f');
99    INSERT INTO t2 VALUES(3, 'g h', 'i j', 'k l');
100    INSERT INTO t3(docid, b, c) SELECT id, b, c FROM t2;
101
102    -- The following query returns a single row with two columns containing
103    -- the text values "i j" and "k l".
104    --
105    -- The query uses the full-text index to discover that the MATCH
106    -- term matches the row with docid=3. It then retrieves the values
107    -- of columns b and c from the row with rowid=3 in the content table
108    -- to return.
109    --
110    SELECT * FROM t3 WHERE t3 MATCH 'k';
111
112    -- Following the UPDATE, the query still returns a single row, this
113    -- time containing the text values "xxx" and "yyy". This is because the
114    -- full-text index still indicates that the row with docid=3 matches
115    -- the FTS4 query 'k', even though the documents stored in the content
116    -- table have been modified.
117    --
118    UPDATE t2 SET b = 'xxx', c = 'yyy' WHERE rowid = 3;
119    SELECT * FROM t3 WHERE t3 MATCH 'k';
120
121    -- Following the DELETE below, the query returns one row containing two
122    -- NULL values. NULL values are returned because FTS is unable to find
123    -- a row with rowid=3 within the content table.
124    --
125    DELETE FROM t2;
126    SELECT * FROM t3 WHERE t3 MATCH 'k';
127
128  When a row is deleted from an external content FTS4 table, FTS4 needs to
129  retrieve the column values of the row being deleted from the content table.
130  This is so that FTS4 can update the full-text index entries for each token
131  that occurs within the deleted row to indicate that that row has been
132  deleted. If the content table row cannot be found, or if it contains values
133  inconsistent with the contents of the FTS index, the results can be difficult
134  to predict. The FTS index may be left containing entries corresponding to the
135  deleted row, which can lead to seemingly nonsensical results being returned
136  by subsequent SELECT queries. The same applies when a row is updated, as
137  internally an UPDATE is the same as a DELETE followed by an INSERT.
138
139  Instead of writing separately to the full-text index and the content table,
140  some users may wish to use database triggers to keep the full-text index
141  up to date with respect to the set of documents stored in the content table.
142  For example, using the tables from earlier examples:
143
144    CREATE TRIGGER t2_bu BEFORE UPDATE ON t2 BEGIN
145      DELETE FROM t3 WHERE docid=old.rowid;
146    END;
147    CREATE TRIGGER t2_bd BEFORE DELETE ON t2 BEGIN
148      DELETE FROM t3 WHERE docid=old.rowid;
149    END;
150
151    CREATE TRIGGER t2_bu AFTER UPDATE ON t2 BEGIN
152      INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c);
153    END;
154    CREATE TRIGGER t2_bd AFTER INSERT ON t2 BEGIN
155      INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c);
156    END;
157
158  The DELETE trigger must be fired before the actual delete takes place
159  on the content table. This is so that FTS4 can still retrieve the original
160  values in order to update the full-text index. And the INSERT trigger must
161  be fired after the new row is inserted, so as to handle the case where the
162  rowid is assigned automatically within the system. The UPDATE trigger must
163  be split into two parts, one fired before and one after the update of the
164  content table, for the same reasons.
165
166  FTS4 features a special command similar to the 'optimize' command that
167  deletes the entire full-text index and rebuilds it based on the current
168  set of documents in the content table. Assuming again that "t3" is the
169  name of the external content FTS4 table, the command is:
170
171    INSERT INTO t3(t3) VALUES('rebuild');
172
173  This command may also be used with ordinary FTS4 tables, although it may
174  only be useful if the full-text index has somehow become corrupt. It is an
175  error to attempt to rebuild the full-text index maintained by a contentless
176  FTS4 table.
177

README.syntax

1
21. OVERVIEW
3
4  This README file describes the syntax of the arguments that may be passed to
5  the FTS3 MATCH operator used for full-text queries. For example, if table
6  "t1" is an Fts3 virtual table, the following SQL query:
7
8    SELECT * FROM t1 WHERE <col> MATCH <full-text query>
9
10  may be used to retrieve all rows that match a specified for full-text query.
11  The text "<col>" should be replaced by either the name of the fts3 table
12  (in this case "t1"), or by the name of one of the columns of the fts3
13  table. <full-text-query> should be replaced by an SQL expression that
14  computes to a string containing an Fts3 query.
15
16  If the left-hand-side of the MATCH operator is set to the name of the
17  fts3 table, then by default the query may be matched against any column
18  of the table. If it is set to a column name, then by default the query
19  may only match the specified column. In both cases this may be overriden
20  as part of the query text (see sections 2 and 3 below).
21
22  As of SQLite version 3.6.8, Fts3 supports two slightly different query
23  formats; the standard syntax, which is used by default, and the enhanced
24  query syntax which can be selected by compiling with the pre-processor
25  symbol SQLITE_ENABLE_FTS3_PARENTHESIS defined.
26
27    -DSQLITE_ENABLE_FTS3_PARENTHESIS
28
292. STANDARD QUERY SYNTAX
30
31  When using the standard Fts3 query syntax, a query usually consists of a
32  list of terms (words) separated by white-space characters. To match a
33  query, a row (or column) of an Fts3 table must contain each of the specified
34  terms. For example, the following query:
35
36    <col> MATCH 'hello world'
37
38  matches rows (or columns, if <col> is the name of a column name) that
39  contain at least one instance of the token "hello", and at least one
40  instance of the token "world". Tokens may be grouped into phrases using
41  quotation marks. In this case, a matching row or column must contain each
42  of the tokens in the phrase in the order specified, with no intervening
43  tokens. For example, the query:
44
45    <col> MATCH '"hello world" joe"
46
47  matches the first of the following two documents, but not the second or
48  third:
49
50    "'Hello world', said Joe."
51    "One should always greet the world with a cheery hello, thought Joe."
52    "How many hello world programs could their be?"
53
54  As well as grouping tokens together by phrase, the binary NEAR operator
55  may be used to search for rows that contain two or more specified tokens
56  or phrases within a specified proximity of each other. The NEAR operator
57  must always be specified in upper case. The word "near" in lower or mixed
58  case is treated as an ordinary token. For example, the following query:
59
60    <col> MATCH 'engineering NEAR consultancy'
61
62  matches rows that contain both the "engineering" and "consultancy" tokens
63  in the same column with not more than 10 other words between them. It does
64  not matter which of the two terms occurs first in the document, only that
65  they be seperated by only 10 tokens or less. The user may also specify
66  a different required proximity by adding "/N" immediately after the NEAR
67  operator, where N is an integer. For example:
68
69    <col> MATCH 'engineering NEAR/5 consultancy'
70
71  searches for a row containing an instance of each specified token seperated
72  by not more than 5 other tokens. More than one NEAR operator can be used
73  in as sequence. For example this query:
74
75    <col> MATCH 'reliable NEAR/2 engineering NEAR/5 consultancy'
76
77  searches for a row that contains an instance of the token "reliable"
78  seperated by not more than two tokens from an instance of "engineering",
79  which is in turn separated by not more than 5 other tokens from an
80  instance of the term "consultancy". Phrases enclosed in quotes may
81  also be used as arguments to the NEAR operator.
82
83  Similar to the NEAR operator, one or more tokens or phrases may be
84  separated by OR operators. In this case, only one of the specified tokens
85  or phrases must appear in the document. For example, the query:
86
87    <col> MATCH 'hello OR world'
88
89  matches rows that contain either the term "hello", or the term "world",
90  or both. Note that unlike in many programming languages, the OR operator
91  has a higher precedence than the AND operators implied between white-space
92  separated tokens. The following query matches documents that contain the
93  term 'sqlite' and at least one of the terms 'fantastic' or 'impressive',
94  not those that contain both 'sqlite' and 'fantastic' or 'impressive':
95
96    <col> MATCH 'sqlite fantastic OR impressive'
97
98  Any token that is part of an Fts3 query expression, whether or not it is
99  part of a phrase enclosed in quotes, may have a '*' character appended to
100  it. In this case, the token matches all terms that begin with the characters
101  of the token, not just those that exactly match it. For example, the
102  following query:
103
104    <col> MATCH 'sql*'
105
106  matches all rows that contain the term "SQLite", as well as those that
107  contain "SQL".
108
109  A token that is not part of a quoted phrase may be preceded by a '-'
110  character, which indicates that matching rows must not contain the
111  specified term. For example, the following:
112
113    <col> MATCH '"database engine" -sqlite'
114
115  matches rows that contain the phrase "database engine" but do not contain
116  the term "sqlite". If the '-' character occurs inside a quoted phrase,
117  it is ignored. It is possible to use both the '-' prefix and the '*' postfix
118  on a single term. At this time, all Fts3 queries must contain at least
119  one term or phrase that is not preceded by the '-' prefix.
120
121  Regardless of whether or not a table name or column name is used on the
122  left hand side of the MATCH operator, a specific column of the fts3 table
123  may be associated with each token in a query by preceding a token with
124  a column name followed by a ':' character. For example, regardless of what
125  is specified for <col>, the following query requires that column "col1"
126  of the table contains the term "hello", and that column "col2" of the
127  table contains the term "world". If the table does not contain columns
128  named "col1" and "col2", then an error is returned and the query is
129  not run.
130
131    <col> MATCH 'col1:hello col2:world'
132
133  It is not possible to associate a specific table column with a quoted
134  phrase or a term preceded by a '-' operator. A '*' character may be
135  appended to a term associated with a specific column for prefix matching.
136
1373. ENHANCED QUERY SYNTAX
138
139  The enhanced query syntax is quite similar to the standard query syntax,
140  with the following four differences:
141
142  1) Parenthesis are supported. When using the enhanced query syntax,
143     parenthesis may be used to overcome the built-in precedence of the
144     supplied binary operators. For example, the following query:
145
146       <col> MATCH '(hello world) OR (simple example)'
147
148     matches documents that contain both "hello" and "world", and documents
149     that contain both "simple" and "example". It is not possible to forumlate
150     such a query using the standard syntax.
151
152  2) Instead of separating tokens and phrases by whitespace, an AND operator
153     may be explicitly specified. This does not change query processing at
154     all, but may be used to improve readability. For example, the following
155     query is handled identically to the one above:
156
157       <col> MATCH '(hello AND world) OR (simple AND example)'
158
159     As with the OR and NEAR operators, the AND operator must be specified
160     in upper case. The word "and" specified in lower or mixed case is
161     handled as a regular token.
162
163  3) The '-' token prefix is not supported. Instead, a new binary operator,
164     NOT, is included. The NOT operator requires that the query specified
165     as its left-hand operator matches, but that the query specified as the
166     right-hand operator does not. For example, to query for all rows that
167     contain the term "example" but not the term "simple", the following
168     query could be used:
169
170       <col> MATCH 'example NOT simple'
171
172     As for all other operators, the NOT operator must be specified in
173     upper case. Otherwise it will be treated as a regular token.
174
175  4) Unlike in the standard syntax, where the OR operator has a higher
176     precedence than the implicit AND operator, when using the enhanced
177     syntax implicit and explict AND operators have a higher precedence
178     than OR operators. Using the enhanced syntax, the following two
179     queries are equivalent:
180
181       <col> MATCH 'sqlite fantastic OR impressive'
182       <col> MATCH '(sqlite AND fantastic) OR impressive'
183
184     however, when using the standard syntax, the query:
185
186       <col> MATCH 'sqlite fantastic OR impressive'
187
188     is equivalent to the enhanced syntax query:
189
190       <col> MATCH 'sqlite AND (fantastic OR impressive)'
191
192     The precedence of all enhanced syntax operators, in order from highest
193     to lowest, is:
194
195       NEAR       (highest precedence, tightest grouping)
196       NOT
197       AND
198       OR         (lowest precedence, loosest grouping)
199
200  Using the advanced syntax, it is possible to specify expressions enclosed
201  in parenthesis as operands to the NOT, AND and OR operators. However both
202  the left and right hand side operands of NEAR operators must be either
203  tokens or phrases. Attempting the following query will return an error:
204
205    <col> MATCH 'sqlite NEAR (fantastic OR impressive)'
206
207  Queries of this form must be re-written as:
208
209    <col> MATCH 'sqlite NEAR fantastic OR sqlite NEAR impressive'
210

README.tokenizers

1
21. FTS3 Tokenizers
3
4  When creating a new full-text table, FTS3 allows the user to select
5  the text tokenizer implementation to be used when indexing text
6  by specifying a "tokenize" clause as part of the CREATE VIRTUAL TABLE
7  statement:
8
9    CREATE VIRTUAL TABLE <table-name> USING fts3(
10      <columns ...> [, tokenize <tokenizer-name> [<tokenizer-args>]]
11    );
12
13  The built-in tokenizers (valid values to pass as <tokenizer name>) are
14  "simple", "porter" and "unicode".
15
16  <tokenizer-args> should consist of zero or more white-space separated
17  arguments to pass to the selected tokenizer implementation. The
18  interpretation of the arguments, if any, depends on the individual
19  tokenizer.
20
212. Custom Tokenizers
22
23  FTS3 allows users to provide custom tokenizer implementations. The
24  interface used to create a new tokenizer is defined and described in
25  the fts3_tokenizer.h source file.
26
27  Registering a new FTS3 tokenizer is similar to registering a new
28  virtual table module with SQLite. The user passes a pointer to a
29  structure containing pointers to various callback functions that
30  make up the implementation of the new tokenizer type. For tokenizers,
31  the structure (defined in fts3_tokenizer.h) is called
32  "sqlite3_tokenizer_module".
33
34  FTS3 does not expose a C-function that users call to register new
35  tokenizer types with a database handle. Instead, the pointer must
36  be encoded as an SQL blob value and passed to FTS3 through the SQL
37  engine by evaluating a special scalar function, "fts3_tokenizer()".
38  The fts3_tokenizer() function may be called with one or two arguments,
39  as follows:
40
41    SELECT fts3_tokenizer(<tokenizer-name>);
42    SELECT fts3_tokenizer(<tokenizer-name>, <sqlite3_tokenizer_module ptr>);
43
44  Where <tokenizer-name> is a string identifying the tokenizer and
45  <sqlite3_tokenizer_module ptr> is a pointer to an sqlite3_tokenizer_module
46  structure encoded as an SQL blob. If the second argument is present,
47  it is registered as tokenizer <tokenizer-name> and a copy of it
48  returned. If only one argument is passed, a pointer to the tokenizer
49  implementation currently registered as <tokenizer-name> is returned,
50  encoded as a blob. Or, if no such tokenizer exists, an SQL exception
51  (error) is raised.
52
53  SECURITY: If the fts3 extension is used in an environment where potentially
54    malicious users may execute arbitrary SQL (i.e. gears), they should be
55    prevented from invoking the fts3_tokenizer() function.  The
56    fts3_tokenizer() function is disabled by default. It is only enabled
57    by SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER. Do not enable it in
58    security sensitive environments.
59
60  See "Sample code" below for an example of calling the fts3_tokenizer()
61  function from C code.
62
633. ICU Library Tokenizers
64
65  If this extension is compiled with the SQLITE_ENABLE_ICU pre-processor
66  symbol defined, then there exists a built-in tokenizer named "icu"
67  implemented using the ICU library. The first argument passed to the
68  xCreate() method (see fts3_tokenizer.h) of this tokenizer may be
69  an ICU locale identifier. For example "tr_TR" for Turkish as used
70  in Turkey, or "en_AU" for English as used in Australia. For example:
71
72    "CREATE VIRTUAL TABLE thai_text USING fts3(text, tokenizer icu th_TH)"
73
74  The ICU tokenizer implementation is very simple. It splits the input
75  text according to the ICU rules for finding word boundaries and discards
76  any tokens that consist entirely of white-space. This may be suitable
77  for some applications in some locales, but not all. If more complex
78  processing is required, for example to implement stemming or
79  discard punctuation, this can be done by creating a tokenizer
80  implementation that uses the ICU tokenizer as part of its implementation.
81
82  When using the ICU tokenizer this way, it is safe to overwrite the
83  contents of the strings returned by the xNext() method (see
84  fts3_tokenizer.h).
85
864. Sample code.
87
88  The following two code samples illustrate the way C code should invoke
89  the fts3_tokenizer() scalar function:
90
91      int registerTokenizer(
92        sqlite3 *db,
93        char *zName,
94        const sqlite3_tokenizer_module *p
95      ){
96        int rc;
97        sqlite3_stmt *pStmt;
98        const char zSql[] = "SELECT fts3_tokenizer(?, ?)";
99
100        rc = sqlite3_prepare_v2(db, zSql, -1, &pStmt, 0);
101        if( rc!=SQLITE_OK ){
102          return rc;
103        }
104
105        sqlite3_bind_text(pStmt, 1, zName, -1, SQLITE_STATIC);
106        sqlite3_bind_blob(pStmt, 2, &p, sizeof(p), SQLITE_STATIC);
107        sqlite3_step(pStmt);
108
109        return sqlite3_finalize(pStmt);
110      }
111
112      int queryTokenizer(
113        sqlite3 *db,
114        char *zName,
115        const sqlite3_tokenizer_module **pp
116      ){
117        int rc;
118        sqlite3_stmt *pStmt;
119        const char zSql[] = "SELECT fts3_tokenizer(?)";
120
121        *pp = 0;
122        rc = sqlite3_prepare_v2(db, zSql, -1, &pStmt, 0);
123        if( rc!=SQLITE_OK ){
124          return rc;
125        }
126
127        sqlite3_bind_text(pStmt, 1, zName, -1, SQLITE_STATIC);
128        if( SQLITE_ROW==sqlite3_step(pStmt) ){
129          if( sqlite3_column_type(pStmt, 0)==SQLITE_BLOB ){
130            memcpy(pp, sqlite3_column_blob(pStmt, 0), sizeof(*pp));
131          }
132        }
133
134        return sqlite3_finalize(pStmt);
135      }
136

README.txt

1This folder contains source code to the second full-text search
2extension for SQLite.  While the API is the same, this version uses a
3substantially different storage schema from fts1, so tables will need
4to be rebuilt.
5