1*042f88e5Sdan 2*042f88e5SdanFTS4 CONTENT OPTION 3*042f88e5Sdan 4*042f88e5Sdan Normally, in order to create a full-text index on a dataset, the FTS4 5*042f88e5Sdan module stores a copy of all indexed documents in a specially created 6*042f88e5Sdan database table. 7*042f88e5Sdan 8*042f88e5Sdan As of SQLite version 3.7.9, FTS4 supports a new option - "content" - 9*042f88e5Sdan designed to extend FTS4 to support the creation of full-text indexes where: 10*042f88e5Sdan 11*042f88e5Sdan * The indexed documents are not stored within the SQLite database 12*042f88e5Sdan at all (a "contentless" FTS4 table), or 13*042f88e5Sdan 14*042f88e5Sdan * The indexed documents are stored in a database table created and 15*042f88e5Sdan managed by the user (an "external content" FTS4 table). 16*042f88e5Sdan 17*042f88e5Sdan Because the indexed documents themselves are usually much larger than 18*042f88e5Sdan the full-text index, the content option can sometimes be used to achieve 19*042f88e5Sdan significant space savings. 20*042f88e5Sdan 21*042f88e5SdanCONTENTLESS FTS4 TABLES 22*042f88e5Sdan 23*042f88e5Sdan In order to create an FTS4 table that does not store a copy of the indexed 24*042f88e5Sdan documents at all, the content option should be set to an empty string. 25*042f88e5Sdan For example, the following SQL creates such an FTS4 table with three 26*042f88e5Sdan columns - "a", "b", and "c": 27*042f88e5Sdan 28*042f88e5Sdan CREATE VIRTUAL TABLE t1 USING fts4(content="", a, b, c); 29*042f88e5Sdan 30*042f88e5Sdan Data can be inserted into such an FTS4 table using an INSERT statements. 31*042f88e5Sdan However, unlike ordinary FTS4 tables, the user must supply an explicit 32*042f88e5Sdan integer docid value. For example: 33*042f88e5Sdan 34*042f88e5Sdan -- This statement is Ok: 35*042f88e5Sdan INSERT INTO t1(docid, a, b, c) VALUES(1, 'a b c', 'd e f', 'g h i'); 36*042f88e5Sdan 37*042f88e5Sdan -- This statement causes an error, as no docid value has been provided: 38*042f88e5Sdan INSERT INTO t1(a, b, c) VALUES('j k l', 'm n o', 'p q r'); 39*042f88e5Sdan 40*042f88e5Sdan It is not possible to UPDATE or DELETE a row stored in a contentless FTS4 41*042f88e5Sdan table. Attempting to do so is an error. 42*042f88e5Sdan 43*042f88e5Sdan Contentless FTS4 tables also support SELECT statements. However, it is 44*042f88e5Sdan an error to attempt to retrieve the value of any table column other than 45*042f88e5Sdan the docid column. The auxiliary function matchinfo() may be used, but 46*042f88e5Sdan snippet() and offsets() may not. For example: 47*042f88e5Sdan 48*042f88e5Sdan -- The following statements are Ok: 49*042f88e5Sdan SELECT docid FROM t1 WHERE t1 MATCH 'xxx'; 50*042f88e5Sdan SELECT docid FROM t1 WHERE a MATCH 'xxx'; 51*042f88e5Sdan SELECT matchinfo(t1) FROM t1 WHERE t1 MATCH 'xxx'; 52*042f88e5Sdan 53*042f88e5Sdan -- The following statements all cause errors, as the value of columns 54*042f88e5Sdan -- other than docid are required to evaluate them. 55*042f88e5Sdan SELECT * FROM t1; 56*042f88e5Sdan SELECT a, b FROM t1 WHERE t1 MATCH 'xxx'; 57*042f88e5Sdan SELECT docid FROM t1 WHERE a LIKE 'xxx%'; 58*042f88e5Sdan SELECT snippet(t1) FROM t1 WHERE t1 MATCH 'xxx'; 59*042f88e5Sdan 60*042f88e5Sdan Errors related to attempting to retrieve column values other than docid 61*042f88e5Sdan are runtime errors that occur within sqlite3_step(). In some cases, for 62*042f88e5Sdan example if the MATCH expression in a SELECT query matches zero rows, there 63*042f88e5Sdan may be no error at all even if a statement does refer to column values 64*042f88e5Sdan other than docid. 65*042f88e5Sdan 66*042f88e5SdanEXTERNAL CONTENT FTS4 TABLES 67*042f88e5Sdan 68*042f88e5Sdan An "external content" FTS4 table is similar to a contentless table, except 69*042f88e5Sdan that if evaluation of a query requires the value of a column other than 70*042f88e5Sdan docid, FTS4 attempts to retrieve that value from a table (or view, or 71*042f88e5Sdan virtual table) nominated by the user (hereafter referred to as the "content 72*042f88e5Sdan table"). The FTS4 module never writes to the content table, and writing 73*042f88e5Sdan to the content table does not affect the full-text index. It is the 74*042f88e5Sdan responsibility of the user to ensure that the content table and the 75*042f88e5Sdan full-text index are consistent. 76*042f88e5Sdan 77*042f88e5Sdan An external content FTS4 table is created by setting the content option 78*042f88e5Sdan to the name of a table (or view, or virtual table) that may be queried by 79*042f88e5Sdan FTS4 to retrieve column values when required. If the nominated table does 80*042f88e5Sdan not exist, then an external content table behaves in the same way as 81*042f88e5Sdan a contentless table. For example: 82*042f88e5Sdan 83*042f88e5Sdan CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c); 84*042f88e5Sdan CREATE VIRTUAL TABLE t3 USING fts4(content="t2", a, c); 85*042f88e5Sdan 86*042f88e5Sdan Assuming the nominated table does exist, then its columns must be the same 87*042f88e5Sdan as or a superset of those defined for the FTS table. 88*042f88e5Sdan 89*042f88e5Sdan When a users query on the FTS table requires a column value other than 90*042f88e5Sdan docid, FTS attempts to read this value from the corresponding column of 91*042f88e5Sdan the row in the content table with a rowid value equal to the current FTS 92*042f88e5Sdan docid. Or, if such a row cannot be found in the content table, a NULL 93*042f88e5Sdan value is used instead. For example: 94*042f88e5Sdan 95*042f88e5Sdan CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c, d); 96*042f88e5Sdan CREATE VIRTUAL TABLE t3 USING fts4(content="t2", b, c); 97*042f88e5Sdan 98*042f88e5Sdan INSERT INTO t2 VALUES(2, 'a b', 'c d', 'e f'); 99*042f88e5Sdan INSERT INTO t2 VALUES(3, 'g h', 'i j', 'k l'); 100*042f88e5Sdan INSERT INTO t3(docid, b, c) SELECT id, b, c FROM t2; 101*042f88e5Sdan 102*042f88e5Sdan -- The following query returns a single row with two columns containing 103*042f88e5Sdan -- the text values "i j" and "k l". 104*042f88e5Sdan -- 105*042f88e5Sdan -- The query uses the full-text index to discover that the MATCH 106*042f88e5Sdan -- term matches the row with docid=3. It then retrieves the values 107*042f88e5Sdan -- of columns b and c from the row with rowid=3 in the content table 108*042f88e5Sdan -- to return. 109*042f88e5Sdan -- 110*042f88e5Sdan SELECT * FROM t3 WHERE t3 MATCH 'k'; 111*042f88e5Sdan 112*042f88e5Sdan -- Following the UPDATE, the query still returns a single row, this 113*042f88e5Sdan -- time containing the text values "xxx" and "yyy". This is because the 114*042f88e5Sdan -- full-text index still indicates that the row with docid=3 matches 115*042f88e5Sdan -- the FTS4 query 'k', even though the documents stored in the content 116*042f88e5Sdan -- table have been modified. 117*042f88e5Sdan -- 118*042f88e5Sdan UPDATE t2 SET b = 'xxx', c = 'yyy' WHERE rowid = 3; 119*042f88e5Sdan SELECT * FROM t3 WHERE t3 MATCH 'k'; 120*042f88e5Sdan 121*042f88e5Sdan -- Following the DELETE below, the query returns one row containing two 122*042f88e5Sdan -- NULL values. NULL values are returned because FTS is unable to find 123*042f88e5Sdan -- a row with rowid=3 within the content table. 124*042f88e5Sdan -- 125*042f88e5Sdan DELETE FROM t2; 126*042f88e5Sdan SELECT * FROM t3 WHERE t3 MATCH 'k'; 127*042f88e5Sdan 128*042f88e5Sdan When a row is deleted from an external content FTS4 table, FTS4 needs to 129*042f88e5Sdan retrieve the column values of the row being deleted from the content table. 130*042f88e5Sdan This is so that FTS4 can update the full-text index entries for each token 131*042f88e5Sdan that occurs within the deleted row to indicate that that row has been 132*042f88e5Sdan deleted. If the content table row cannot be found, or if it contains values 133*042f88e5Sdan inconsistent with the contents of the FTS index, the results can be difficult 134*042f88e5Sdan to predict. The FTS index may be left containing entries corresponding to the 135*042f88e5Sdan deleted row, which can lead to seemingly nonsensical results being returned 136*042f88e5Sdan by subsequent SELECT queries. The same applies when a row is updated, as 137*042f88e5Sdan internally an UPDATE is the same as a DELETE followed by an INSERT. 138*042f88e5Sdan 139*042f88e5Sdan Instead of writing separately to the full-text index and the content table, 140*042f88e5Sdan some users may wish to use database triggers to keep the full-text index 141*042f88e5Sdan up to date with respect to the set of documents stored in the content table. 142*042f88e5Sdan For example, using the tables from earlier examples: 143*042f88e5Sdan 144*042f88e5Sdan CREATE TRIGGER t2_bu BEFORE UPDATE ON t2 BEGIN 145*042f88e5Sdan DELETE FROM t3 WHERE docid=old.rowid; 146*042f88e5Sdan END; 147*042f88e5Sdan CREATE TRIGGER t2_bd BEFORE DELETE ON t2 BEGIN 148*042f88e5Sdan DELETE FROM t3 WHERE docid=old.rowid; 149*042f88e5Sdan END; 150*042f88e5Sdan 151*042f88e5Sdan CREATE TRIGGER t2_bu AFTER UPDATE ON t2 BEGIN 152*042f88e5Sdan INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c); 153*042f88e5Sdan END; 154*042f88e5Sdan CREATE TRIGGER t2_bd AFTER INSERT ON t2 BEGIN 155*042f88e5Sdan INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c); 156*042f88e5Sdan END; 157*042f88e5Sdan 158*042f88e5Sdan The DELETE trigger must be fired before the actual delete takes place 159*042f88e5Sdan on the content table. This is so that FTS4 can still retrieve the original 160*042f88e5Sdan values in order to update the full-text index. And the INSERT trigger must 161*042f88e5Sdan be fired after the new row is inserted, so as to handle the case where the 162*042f88e5Sdan rowid is assigned automatically within the system. The UPDATE trigger must 163*042f88e5Sdan be split into two parts, one fired before and one after the update of the 164*042f88e5Sdan content table, for the same reasons. 165*042f88e5Sdan 166*042f88e5Sdan FTS4 features a special command similar to the 'optimize' command that 167*042f88e5Sdan deletes the entire full-text index and rebuilds it based on the current 168*042f88e5Sdan set of documents in the content table. Assuming again that "t3" is the 169*042f88e5Sdan name of the external content FTS4 table, the command is: 170*042f88e5Sdan 171*042f88e5Sdan INSERT INTO t3(t3) VALUES('rebuild'); 172*042f88e5Sdan 173*042f88e5Sdan This command may also be used with ordinary FTS4 tables, although it may 174*042f88e5Sdan only be useful if the full-text index has somehow become corrupt. It is an 175*042f88e5Sdan error to attempt to rebuild the full-text index maintained by a contentless 176*042f88e5Sdan FTS4 table. 177