xref: /sqlite-3.40.0/src/in-operator.md (revision 539f2fef)
1IN-Operator Implementation Notes
2================================
3
4## Definitions:
5
6An IN operator has one of the following formats:
7
8>
9     x IN (list)
10     x IN (subquery)
11
12The "x" is referred to as the LHS (left-hand side).  The list or subquery
13on the right is called the RHS (right-hand side).  If the RHS is a list
14it must be a non-empty list.  But if the RHS is a subquery, it can be an
15empty set.
16
17Both the LHS and RHS can be scalars or vectors.  The two must match.
18In other words, they must both be scalar or else they must both be
19vectors of the same length.
20
21NULL values can occur in either or both of the LHS and RHS.
22If the LHS contains only
23NULL values then we say that it is a "total-NULL".  If the LHS contains
24some NULL values and some non-NULL values, then it is a "partial-NULL".
25For a scalar, there is no difference between a partial-NULL and a total-NULL.
26The RHS is a partial-NULL if any row contains a NULL value.  The RHS is
27a total-NULL if it contains one or more rows that contain only NULL values.
28The LHS is called "non-NULL" if it contains no NULL values.  The RHS is
29called "non-NULL" if it contains no NULL values in any row.
30
31The result of an IN operator is one of TRUE, FALSE, or NULL.  A NULL result
32means that it cannot be determined if the LHS is contained in the RHS due
33to the presence of NULL values.  In some contexts (for example, when the IN
34operator occurs in a WHERE clause)
35the system only needs a binary result: TRUE or NOT-TRUE.  One can also
36to define a binary result of FALSE and NOT-FALSE, but
37it turns out that no extra optimizations are possible in that case, so if
38the FALSE/NOT-FALSE binary is needed, we have to compute the three-state
39TRUE/FALSE/NULL result and then combine the TRUE and NULL values into
40NOT-FALSE.
41
42A "NOT IN" operator is computed by first computing the equivalent IN
43operator, then interchanging the TRUE and FALSE results.
44
45## Simple Full-Scan Algorithm
46
47The following algorithm always compute the correct answer.  However, this
48algorithm is suboptimal, especially if there are many rows on the RHS.
49
50  1.  Set the null-flag to false
51  2.  For each row in the RHS:
52      <ol type='a'>
53      <li>  Compare the LHS against the RHS
54      <li>  If the LHS exactly matches the RHS, immediately return TRUE
55      <li>  If the comparison result is NULL, set the null-flag to true
56      </ol>
57  3.  If the null-flag is true, return NULL.
58  4.  Return FALSE
59
60## Optimized Algorithm
61
62The following procedure computes the same answer as the simple full-scan
63algorithm, though it does so with less work in the common case.  This
64is the algorithm that is implemented in SQLite.  The steps must occur
65in the order specified.  Except for the INDEX_NOOP optimization of step 1,
66none of the steps can be skipped.
67
68  1.  If the RHS is a constant list of length 1 or 2, then rewrite the
69      IN operator as a simple expression.  Implement
70
71            x IN (y1,y2)
72
73      as if it were
74
75            x=y1 OR x=y2
76
77      This is the INDEX_NOOP optimization and is only undertaken if the
78      IN operator is used for membership testing.  If the IN operator is
79      driving a loop, then skip this step entirely.
80
81  2.  If the RHS is empty, return FALSE.
82
83  3.  If the LHS is a total-NULL or if the RHS contains a total-NULL,
84      then return NULL.
85
86  4.  If the LHS is non-NULL, then use the LHS as a probe in a binary
87      search of the RHS
88
89      <ol type='a'>
90      <li> If the binary search finds an exact match, return TRUE
91
92      <li> If the RHS is known to be not-null, return FALSE
93      </ol>
94
95  5.  At this point, it is known that the result cannot be TRUE.  All
96      that remains is to distinguish between NULL and FALSE.
97      If a NOT-TRUE result is acceptable, then return NOT-TRUE now.
98
99  6.  For each row in the RHS, compare that row against the LHS and
100      if the result is NULL, immediately return NULL.  This step is
101      essentially the "Simple Full-scan Algorithm" above with the
102      tests for TRUE removed, since we know that the result cannot be
103      TRUE at this point.
104
105  7.  Return FALSE.
106