xref: /sqlite-3.40.0/ext/fts3/README.content (revision 067b92ba)
1*042f88e5Sdan
2*042f88e5SdanFTS4 CONTENT OPTION
3*042f88e5Sdan
4*042f88e5Sdan  Normally, in order to create a full-text index on a dataset, the FTS4
5*042f88e5Sdan  module stores a copy of all indexed documents in a specially created
6*042f88e5Sdan  database table.
7*042f88e5Sdan
8*042f88e5Sdan  As of SQLite version 3.7.9, FTS4 supports a new option - "content" -
9*042f88e5Sdan  designed to extend FTS4 to support the creation of full-text indexes where:
10*042f88e5Sdan
11*042f88e5Sdan    * The indexed documents are not stored within the SQLite database
12*042f88e5Sdan      at all (a "contentless" FTS4 table), or
13*042f88e5Sdan
14*042f88e5Sdan    * The indexed documents are stored in a database table created and
15*042f88e5Sdan      managed by the user (an "external content" FTS4 table).
16*042f88e5Sdan
17*042f88e5Sdan  Because the indexed documents themselves are usually much larger than
18*042f88e5Sdan  the full-text index, the content option can sometimes be used to achieve
19*042f88e5Sdan  significant space savings.
20*042f88e5Sdan
21*042f88e5SdanCONTENTLESS FTS4 TABLES
22*042f88e5Sdan
23*042f88e5Sdan  In order to create an FTS4 table that does not store a copy of the indexed
24*042f88e5Sdan  documents at all, the content option should be set to an empty string.
25*042f88e5Sdan  For example, the following SQL creates such an FTS4 table with three
26*042f88e5Sdan  columns - "a", "b", and "c":
27*042f88e5Sdan
28*042f88e5Sdan    CREATE VIRTUAL TABLE t1 USING fts4(content="", a, b, c);
29*042f88e5Sdan
30*042f88e5Sdan  Data can be inserted into such an FTS4 table using an INSERT statements.
31*042f88e5Sdan  However, unlike ordinary FTS4 tables, the user must supply an explicit
32*042f88e5Sdan  integer docid value. For example:
33*042f88e5Sdan
34*042f88e5Sdan    -- This statement is Ok:
35*042f88e5Sdan    INSERT INTO t1(docid, a, b, c) VALUES(1, 'a b c', 'd e f', 'g h i');
36*042f88e5Sdan
37*042f88e5Sdan    -- This statement causes an error, as no docid value has been provided:
38*042f88e5Sdan    INSERT INTO t1(a, b, c) VALUES('j k l', 'm n o', 'p q r');
39*042f88e5Sdan
40*042f88e5Sdan  It is not possible to UPDATE or DELETE a row stored in a contentless FTS4
41*042f88e5Sdan  table. Attempting to do so is an error.
42*042f88e5Sdan
43*042f88e5Sdan  Contentless FTS4 tables also support SELECT statements. However, it is
44*042f88e5Sdan  an error to attempt to retrieve the value of any table column other than
45*042f88e5Sdan  the docid column. The auxiliary function matchinfo() may be used, but
46*042f88e5Sdan  snippet() and offsets() may not. For example:
47*042f88e5Sdan
48*042f88e5Sdan    -- The following statements are Ok:
49*042f88e5Sdan    SELECT docid FROM t1 WHERE t1 MATCH 'xxx';
50*042f88e5Sdan    SELECT docid FROM t1 WHERE a MATCH 'xxx';
51*042f88e5Sdan    SELECT matchinfo(t1) FROM t1 WHERE t1 MATCH 'xxx';
52*042f88e5Sdan
53*042f88e5Sdan    -- The following statements all cause errors, as the value of columns
54*042f88e5Sdan    -- other than docid are required to evaluate them.
55*042f88e5Sdan    SELECT * FROM t1;
56*042f88e5Sdan    SELECT a, b FROM t1 WHERE t1 MATCH 'xxx';
57*042f88e5Sdan    SELECT docid FROM t1 WHERE a LIKE 'xxx%';
58*042f88e5Sdan    SELECT snippet(t1) FROM t1 WHERE t1 MATCH 'xxx';
59*042f88e5Sdan
60*042f88e5Sdan  Errors related to attempting to retrieve column values other than docid
61*042f88e5Sdan  are runtime errors that occur within sqlite3_step(). In some cases, for
62*042f88e5Sdan  example if the MATCH expression in a SELECT query matches zero rows, there
63*042f88e5Sdan  may be no error at all even if a statement does refer to column values
64*042f88e5Sdan  other than docid.
65*042f88e5Sdan
66*042f88e5SdanEXTERNAL CONTENT FTS4 TABLES
67*042f88e5Sdan
68*042f88e5Sdan  An "external content" FTS4 table is similar to a contentless table, except
69*042f88e5Sdan  that if evaluation of a query requires the value of a column other than
70*042f88e5Sdan  docid, FTS4 attempts to retrieve that value from a table (or view, or
71*042f88e5Sdan  virtual table) nominated by the user (hereafter referred to as the "content
72*042f88e5Sdan  table"). The FTS4 module never writes to the content table, and writing
73*042f88e5Sdan  to the content table does not affect the full-text index. It is the
74*042f88e5Sdan  responsibility of the user to ensure that the content table and the
75*042f88e5Sdan  full-text index are consistent.
76*042f88e5Sdan
77*042f88e5Sdan  An external content FTS4 table is created by setting the content option
78*042f88e5Sdan  to the name of a table (or view, or virtual table) that may be queried by
79*042f88e5Sdan  FTS4 to retrieve column values when required. If the nominated table does
80*042f88e5Sdan  not exist, then an external content table behaves in the same way as
81*042f88e5Sdan  a contentless table. For example:
82*042f88e5Sdan
83*042f88e5Sdan    CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c);
84*042f88e5Sdan    CREATE VIRTUAL TABLE t3 USING fts4(content="t2", a, c);
85*042f88e5Sdan
86*042f88e5Sdan  Assuming the nominated table does exist, then its columns must be the same
87*042f88e5Sdan  as or a superset of those defined for the FTS table.
88*042f88e5Sdan
89*042f88e5Sdan  When a users query on the FTS table requires a column value other than
90*042f88e5Sdan  docid, FTS attempts to read this value from the corresponding column of
91*042f88e5Sdan  the row in the content table with a rowid value equal to the current FTS
92*042f88e5Sdan  docid. Or, if such a row cannot be found in the content table, a NULL
93*042f88e5Sdan  value is used instead. For example:
94*042f88e5Sdan
95*042f88e5Sdan    CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c, d);
96*042f88e5Sdan    CREATE VIRTUAL TABLE t3 USING fts4(content="t2", b, c);
97*042f88e5Sdan
98*042f88e5Sdan    INSERT INTO t2 VALUES(2, 'a b', 'c d', 'e f');
99*042f88e5Sdan    INSERT INTO t2 VALUES(3, 'g h', 'i j', 'k l');
100*042f88e5Sdan    INSERT INTO t3(docid, b, c) SELECT id, b, c FROM t2;
101*042f88e5Sdan
102*042f88e5Sdan    -- The following query returns a single row with two columns containing
103*042f88e5Sdan    -- the text values "i j" and "k l".
104*042f88e5Sdan    --
105*042f88e5Sdan    -- The query uses the full-text index to discover that the MATCH
106*042f88e5Sdan    -- term matches the row with docid=3. It then retrieves the values
107*042f88e5Sdan    -- of columns b and c from the row with rowid=3 in the content table
108*042f88e5Sdan    -- to return.
109*042f88e5Sdan    --
110*042f88e5Sdan    SELECT * FROM t3 WHERE t3 MATCH 'k';
111*042f88e5Sdan
112*042f88e5Sdan    -- Following the UPDATE, the query still returns a single row, this
113*042f88e5Sdan    -- time containing the text values "xxx" and "yyy". This is because the
114*042f88e5Sdan    -- full-text index still indicates that the row with docid=3 matches
115*042f88e5Sdan    -- the FTS4 query 'k', even though the documents stored in the content
116*042f88e5Sdan    -- table have been modified.
117*042f88e5Sdan    --
118*042f88e5Sdan    UPDATE t2 SET b = 'xxx', c = 'yyy' WHERE rowid = 3;
119*042f88e5Sdan    SELECT * FROM t3 WHERE t3 MATCH 'k';
120*042f88e5Sdan
121*042f88e5Sdan    -- Following the DELETE below, the query returns one row containing two
122*042f88e5Sdan    -- NULL values. NULL values are returned because FTS is unable to find
123*042f88e5Sdan    -- a row with rowid=3 within the content table.
124*042f88e5Sdan    --
125*042f88e5Sdan    DELETE FROM t2;
126*042f88e5Sdan    SELECT * FROM t3 WHERE t3 MATCH 'k';
127*042f88e5Sdan
128*042f88e5Sdan  When a row is deleted from an external content FTS4 table, FTS4 needs to
129*042f88e5Sdan  retrieve the column values of the row being deleted from the content table.
130*042f88e5Sdan  This is so that FTS4 can update the full-text index entries for each token
131*042f88e5Sdan  that occurs within the deleted row to indicate that that row has been
132*042f88e5Sdan  deleted. If the content table row cannot be found, or if it contains values
133*042f88e5Sdan  inconsistent with the contents of the FTS index, the results can be difficult
134*042f88e5Sdan  to predict. The FTS index may be left containing entries corresponding to the
135*042f88e5Sdan  deleted row, which can lead to seemingly nonsensical results being returned
136*042f88e5Sdan  by subsequent SELECT queries. The same applies when a row is updated, as
137*042f88e5Sdan  internally an UPDATE is the same as a DELETE followed by an INSERT.
138*042f88e5Sdan
139*042f88e5Sdan  Instead of writing separately to the full-text index and the content table,
140*042f88e5Sdan  some users may wish to use database triggers to keep the full-text index
141*042f88e5Sdan  up to date with respect to the set of documents stored in the content table.
142*042f88e5Sdan  For example, using the tables from earlier examples:
143*042f88e5Sdan
144*042f88e5Sdan    CREATE TRIGGER t2_bu BEFORE UPDATE ON t2 BEGIN
145*042f88e5Sdan      DELETE FROM t3 WHERE docid=old.rowid;
146*042f88e5Sdan    END;
147*042f88e5Sdan    CREATE TRIGGER t2_bd BEFORE DELETE ON t2 BEGIN
148*042f88e5Sdan      DELETE FROM t3 WHERE docid=old.rowid;
149*042f88e5Sdan    END;
150*042f88e5Sdan
151*042f88e5Sdan    CREATE TRIGGER t2_bu AFTER UPDATE ON t2 BEGIN
152*042f88e5Sdan      INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c);
153*042f88e5Sdan    END;
154*042f88e5Sdan    CREATE TRIGGER t2_bd AFTER INSERT ON t2 BEGIN
155*042f88e5Sdan      INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c);
156*042f88e5Sdan    END;
157*042f88e5Sdan
158*042f88e5Sdan  The DELETE trigger must be fired before the actual delete takes place
159*042f88e5Sdan  on the content table. This is so that FTS4 can still retrieve the original
160*042f88e5Sdan  values in order to update the full-text index. And the INSERT trigger must
161*042f88e5Sdan  be fired after the new row is inserted, so as to handle the case where the
162*042f88e5Sdan  rowid is assigned automatically within the system. The UPDATE trigger must
163*042f88e5Sdan  be split into two parts, one fired before and one after the update of the
164*042f88e5Sdan  content table, for the same reasons.
165*042f88e5Sdan
166*042f88e5Sdan  FTS4 features a special command similar to the 'optimize' command that
167*042f88e5Sdan  deletes the entire full-text index and rebuilds it based on the current
168*042f88e5Sdan  set of documents in the content table. Assuming again that "t3" is the
169*042f88e5Sdan  name of the external content FTS4 table, the command is:
170*042f88e5Sdan
171*042f88e5Sdan    INSERT INTO t3(t3) VALUES('rebuild');
172*042f88e5Sdan
173*042f88e5Sdan  This command may also be used with ordinary FTS4 tables, although it may
174*042f88e5Sdan  only be useful if the full-text index has somehow become corrupt. It is an
175*042f88e5Sdan  error to attempt to rebuild the full-text index maintained by a contentless
176*042f88e5Sdan  FTS4 table.
177