1IN-Operator Implementation Notes 2================================ 3 4## Definitions: 5 6An IN operator has one of the following formats: 7 8> 9 x IN (y1,y2,y3,...,yN) 10 x IN (subquery) 11 12The "x" is referred to as the LHS (left-hand side). The list or subquery 13on the right is called the RHS (right-hand side). If the RHS is a list 14it must be a non-empty list. But if the RHS is a subquery, it can be an 15empty set. 16 17The LHS can be a scalar (a single quantity) or a vector (a list of 18two or or more values) or a subquery that returns one or more columns. 19We use the term "vector" to mean an actually list of values or a 20subquery that returns two or more columns. An isolated value or 21a subquery that returns a single columns is called a scalar. 22 23The RHS can be a subquery that returns a single column, a subquery 24that returns two or more columns, or a list of scalars. It is not 25currently support for the RHS to be a list of vectors. 26 27The number of columns for LHS must match the number of columns for 28the RHS. If the RHS is a list of values, then the LHS must be a 29scalar. If the RHS is a subquery returning N columns, then the LHS 30must be a vector of size N. 31 32NULL values can occur in either or both of the LHS and RHS. 33If the LHS contains only 34NULL values then we say that it is a "total-NULL". If the LHS contains 35some NULL values and some non-NULL values, then it is a "partial-NULL". 36For a scalar, there is no difference between a partial-NULL and a total-NULL. 37The RHS is a partial-NULL if any row contains a NULL value. The RHS is 38a total-NULL if it contains one or more rows that contain only NULL values. 39The LHS is called "non-NULL" if it contains no NULL values. The RHS is 40called "non-NULL" if it contains no NULL values in any row. 41 42The result of an IN operator is one of TRUE, FALSE, or NULL. A NULL result 43means that it cannot be determined if the LHS is contained in the RHS due 44to the presence of NULL values. In some contexts (for example, when the IN 45operator occurs in a WHERE clause) 46the system only needs a binary result: TRUE or NOT-TRUE. One can also 47to define a binary result of FALSE and NOT-FALSE, but 48it turns out that no extra optimizations are possible in that case, so if 49the FALSE/NOT-FALSE binary is needed, we have to compute the three-state 50TRUE/FALSE/NULL result and then combine the TRUE and NULL values into 51NOT-FALSE. 52 53A "NOT IN" operator is computed by first computing the equivalent IN 54operator, then interchanging the TRUE and FALSE results. 55 56## Simple Full-Scan Algorithm 57 58The following algorithm always compute the correct answer. However, this 59algorithm is suboptimal, especially if there are many rows on the RHS. 60 61 1. Set the null-flag to false 62 2. For each row in the RHS: 63 <ol type='a'> 64 <li> Compare the LHS against the RHS 65 <li> If the LHS exactly matches the RHS, immediately return TRUE 66 <li> If the comparison result is NULL, set the null-flag to true 67 </ol> 68 3. If the null-flag is true, return NULL. 69 4. Return FALSE 70 71## Optimized Algorithm 72 73The following procedure computes the same answer as the simple full-scan 74algorithm, though it does so with less work in the common case. This 75is the algorithm that is implemented in SQLite. 76 77 1. If the RHS is a constant list of length 1 or 2, then rewrite the 78 IN operator as a simple expression. Implement 79 80 x IN (y1,y2) 81 82 as if it were 83 84 x=y1 OR x=y2 85 86 This is the INDEX_NOOP optimization and is only undertaken if the 87 IN operator is used for membership testing. If the IN operator is 88 driving a loop, then skip this step entirely. 89 90 2. Check the LHS to see if it is a partial-NULL and if it is, jump 91 ahead to step 5. 92 93 3. Do a binary search of the RHS using the LHS as a probe. If 94 an exact match is found, return TRUE. 95 96 4. If the RHS is non-NULL then return FALSE. 97 98 5. If we do not need to distinguish between FALSE and NULL, 99 then return FALSE. 100 101 6. For each row in the RHS, compare that row against the LHS and 102 if the result is NULL, immediately return NULL. In the case 103 of a scalar IN operator, we only need to look at the very first 104 row the RHS because for a scalar RHS, all NULLs will always come 105 first. If the RHS is empty, this step is a no-op. 106 107 7. Return FALSE. 108