1*14ebc28eSMatthew Wilcox===================== 2*14ebc28eSMatthew WilcoxThe errseq_t datatype 3*14ebc28eSMatthew Wilcox===================== 4*14ebc28eSMatthew Wilcox 5*14ebc28eSMatthew WilcoxAn errseq_t is a way of recording errors in one place, and allowing any 6*14ebc28eSMatthew Wilcoxnumber of "subscribers" to tell whether it has changed since a previous 7*14ebc28eSMatthew Wilcoxpoint where it was sampled. 8*14ebc28eSMatthew Wilcox 9*14ebc28eSMatthew WilcoxThe initial use case for this is tracking errors for file 10*14ebc28eSMatthew Wilcoxsynchronization syscalls (fsync, fdatasync, msync and sync_file_range), 11*14ebc28eSMatthew Wilcoxbut it may be usable in other situations. 12*14ebc28eSMatthew Wilcox 13*14ebc28eSMatthew WilcoxIt's implemented as an unsigned 32-bit value. The low order bits are 14*14ebc28eSMatthew Wilcoxdesignated to hold an error code (between 1 and MAX_ERRNO). The upper bits 15*14ebc28eSMatthew Wilcoxare used as a counter. This is done with atomics instead of locking so that 16*14ebc28eSMatthew Wilcoxthese functions can be called from any context. 17*14ebc28eSMatthew Wilcox 18*14ebc28eSMatthew WilcoxNote that there is a risk of collisions if new errors are being recorded 19*14ebc28eSMatthew Wilcoxfrequently, since we have so few bits to use as a counter. 20*14ebc28eSMatthew Wilcox 21*14ebc28eSMatthew WilcoxTo mitigate this, the bit between the error value and counter is used as 22*14ebc28eSMatthew Wilcoxa flag to tell whether the value has been sampled since a new value was 23*14ebc28eSMatthew Wilcoxrecorded. That allows us to avoid bumping the counter if no one has 24*14ebc28eSMatthew Wilcoxsampled it since the last time an error was recorded. 25*14ebc28eSMatthew Wilcox 26*14ebc28eSMatthew WilcoxThus we end up with a value that looks something like this: 27*14ebc28eSMatthew Wilcox 28*14ebc28eSMatthew Wilcox+--------------------------------------+----+------------------------+ 29*14ebc28eSMatthew Wilcox| 31..13 | 12 | 11..0 | 30*14ebc28eSMatthew Wilcox+--------------------------------------+----+------------------------+ 31*14ebc28eSMatthew Wilcox| counter | SF | errno | 32*14ebc28eSMatthew Wilcox+--------------------------------------+----+------------------------+ 33*14ebc28eSMatthew Wilcox 34*14ebc28eSMatthew WilcoxThe general idea is for "watchers" to sample an errseq_t value and keep 35*14ebc28eSMatthew Wilcoxit as a running cursor. That value can later be used to tell whether 36*14ebc28eSMatthew Wilcoxany new errors have occurred since that sampling was done, and atomically 37*14ebc28eSMatthew Wilcoxrecord the state at the time that it was checked. This allows us to 38*14ebc28eSMatthew Wilcoxrecord errors in one place, and then have a number of "watchers" that 39*14ebc28eSMatthew Wilcoxcan tell whether the value has changed since they last checked it. 40*14ebc28eSMatthew Wilcox 41*14ebc28eSMatthew WilcoxA new errseq_t should always be zeroed out. An errseq_t value of all zeroes 42*14ebc28eSMatthew Wilcoxis the special (but common) case where there has never been an error. An all 43*14ebc28eSMatthew Wilcoxzero value thus serves as the "epoch" if one wishes to know whether there 44*14ebc28eSMatthew Wilcoxhas ever been an error set since it was first initialized. 45*14ebc28eSMatthew Wilcox 46*14ebc28eSMatthew WilcoxAPI usage 47*14ebc28eSMatthew Wilcox========= 48*14ebc28eSMatthew Wilcox 49*14ebc28eSMatthew WilcoxLet me tell you a story about a worker drone. Now, he's a good worker 50*14ebc28eSMatthew Wilcoxoverall, but the company is a little...management heavy. He has to 51*14ebc28eSMatthew Wilcoxreport to 77 supervisors today, and tomorrow the "big boss" is coming in 52*14ebc28eSMatthew Wilcoxfrom out of town and he's sure to test the poor fellow too. 53*14ebc28eSMatthew Wilcox 54*14ebc28eSMatthew WilcoxThey're all handing him work to do -- so much he can't keep track of who 55*14ebc28eSMatthew Wilcoxhanded him what, but that's not really a big problem. The supervisors 56*14ebc28eSMatthew Wilcoxjust want to know when he's finished all of the work they've handed him so 57*14ebc28eSMatthew Wilcoxfar and whether he made any mistakes since they last asked. 58*14ebc28eSMatthew Wilcox 59*14ebc28eSMatthew WilcoxHe might have made the mistake on work they didn't actually hand him, 60*14ebc28eSMatthew Wilcoxbut he can't keep track of things at that level of detail, all he can 61*14ebc28eSMatthew Wilcoxremember is the most recent mistake that he made. 62*14ebc28eSMatthew Wilcox 63*14ebc28eSMatthew WilcoxHere's our worker_drone representation:: 64*14ebc28eSMatthew Wilcox 65*14ebc28eSMatthew Wilcox struct worker_drone { 66*14ebc28eSMatthew Wilcox errseq_t wd_err; /* for recording errors */ 67*14ebc28eSMatthew Wilcox }; 68*14ebc28eSMatthew Wilcox 69*14ebc28eSMatthew WilcoxEvery day, the worker_drone starts out with a blank slate:: 70*14ebc28eSMatthew Wilcox 71*14ebc28eSMatthew Wilcox struct worker_drone wd; 72*14ebc28eSMatthew Wilcox 73*14ebc28eSMatthew Wilcox wd.wd_err = (errseq_t)0; 74*14ebc28eSMatthew Wilcox 75*14ebc28eSMatthew WilcoxThe supervisors come in and get an initial read for the day. They 76*14ebc28eSMatthew Wilcoxdon't care about anything that happened before their watch begins:: 77*14ebc28eSMatthew Wilcox 78*14ebc28eSMatthew Wilcox struct supervisor { 79*14ebc28eSMatthew Wilcox errseq_t s_wd_err; /* private "cursor" for wd_err */ 80*14ebc28eSMatthew Wilcox spinlock_t s_wd_err_lock; /* protects s_wd_err */ 81*14ebc28eSMatthew Wilcox } 82*14ebc28eSMatthew Wilcox 83*14ebc28eSMatthew Wilcox struct supervisor su; 84*14ebc28eSMatthew Wilcox 85*14ebc28eSMatthew Wilcox su.s_wd_err = errseq_sample(&wd.wd_err); 86*14ebc28eSMatthew Wilcox spin_lock_init(&su.s_wd_err_lock); 87*14ebc28eSMatthew Wilcox 88*14ebc28eSMatthew WilcoxNow they start handing him tasks to do. Every few minutes they ask him to 89*14ebc28eSMatthew Wilcoxfinish up all of the work they've handed him so far. Then they ask him 90*14ebc28eSMatthew Wilcoxwhether he made any mistakes on any of it:: 91*14ebc28eSMatthew Wilcox 92*14ebc28eSMatthew Wilcox spin_lock(&su.su_wd_err_lock); 93*14ebc28eSMatthew Wilcox err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err); 94*14ebc28eSMatthew Wilcox spin_unlock(&su.su_wd_err_lock); 95*14ebc28eSMatthew Wilcox 96*14ebc28eSMatthew WilcoxUp to this point, that just keeps returning 0. 97*14ebc28eSMatthew Wilcox 98*14ebc28eSMatthew WilcoxNow, the owners of this company are quite miserly and have given him 99*14ebc28eSMatthew Wilcoxsubstandard equipment with which to do his job. Occasionally it 100*14ebc28eSMatthew Wilcoxglitches and he makes a mistake. He sighs a heavy sigh, and marks it 101*14ebc28eSMatthew Wilcoxdown:: 102*14ebc28eSMatthew Wilcox 103*14ebc28eSMatthew Wilcox errseq_set(&wd.wd_err, -EIO); 104*14ebc28eSMatthew Wilcox 105*14ebc28eSMatthew Wilcox...and then gets back to work. The supervisors eventually poll again 106*14ebc28eSMatthew Wilcoxand they each get the error when they next check. Subsequent calls will 107*14ebc28eSMatthew Wilcoxreturn 0, until another error is recorded, at which point it's reported 108*14ebc28eSMatthew Wilcoxto each of them once. 109*14ebc28eSMatthew Wilcox 110*14ebc28eSMatthew WilcoxNote that the supervisors can't tell how many mistakes he made, only 111*14ebc28eSMatthew Wilcoxwhether one was made since they last checked, and the latest value 112*14ebc28eSMatthew Wilcoxrecorded. 113*14ebc28eSMatthew Wilcox 114*14ebc28eSMatthew WilcoxOccasionally the big boss comes in for a spot check and asks the worker 115*14ebc28eSMatthew Wilcoxto do a one-off job for him. He's not really watching the worker 116*14ebc28eSMatthew Wilcoxfull-time like the supervisors, but he does need to know whether a 117*14ebc28eSMatthew Wilcoxmistake occurred while his job was processing. 118*14ebc28eSMatthew Wilcox 119*14ebc28eSMatthew WilcoxHe can just sample the current errseq_t in the worker, and then use that 120*14ebc28eSMatthew Wilcoxto tell whether an error has occurred later:: 121*14ebc28eSMatthew Wilcox 122*14ebc28eSMatthew Wilcox errseq_t since = errseq_sample(&wd.wd_err); 123*14ebc28eSMatthew Wilcox /* submit some work and wait for it to complete */ 124*14ebc28eSMatthew Wilcox err = errseq_check(&wd.wd_err, since); 125*14ebc28eSMatthew Wilcox 126*14ebc28eSMatthew WilcoxSince he's just going to discard "since" after that point, he doesn't 127*14ebc28eSMatthew Wilcoxneed to advance it here. He also doesn't need any locking since it's 128*14ebc28eSMatthew Wilcoxnot usable by anyone else. 129*14ebc28eSMatthew Wilcox 130*14ebc28eSMatthew WilcoxSerializing errseq_t cursor updates 131*14ebc28eSMatthew Wilcox=================================== 132*14ebc28eSMatthew Wilcox 133*14ebc28eSMatthew WilcoxNote that the errseq_t API does not protect the errseq_t cursor during a 134*14ebc28eSMatthew Wilcoxcheck_and_advance_operation. Only the canonical error code is handled 135*14ebc28eSMatthew Wilcoxatomically. In a situation where more than one task might be using the 136*14ebc28eSMatthew Wilcoxsame errseq_t cursor at the same time, it's important to serialize 137*14ebc28eSMatthew Wilcoxupdates to that cursor. 138*14ebc28eSMatthew Wilcox 139*14ebc28eSMatthew WilcoxIf that's not done, then it's possible for the cursor to go backward 140*14ebc28eSMatthew Wilcoxin which case the same error could be reported more than once. 141*14ebc28eSMatthew Wilcox 142*14ebc28eSMatthew WilcoxBecause of this, it's often advantageous to first do an errseq_check to 143*14ebc28eSMatthew Wilcoxsee if anything has changed, and only later do an 144*14ebc28eSMatthew Wilcoxerrseq_check_and_advance after taking the lock. e.g.:: 145*14ebc28eSMatthew Wilcox 146*14ebc28eSMatthew Wilcox if (errseq_check(&wd.wd_err, READ_ONCE(su.s_wd_err)) { 147*14ebc28eSMatthew Wilcox /* su.s_wd_err is protected by s_wd_err_lock */ 148*14ebc28eSMatthew Wilcox spin_lock(&su.s_wd_err_lock); 149*14ebc28eSMatthew Wilcox err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err); 150*14ebc28eSMatthew Wilcox spin_unlock(&su.s_wd_err_lock); 151*14ebc28eSMatthew Wilcox } 152*14ebc28eSMatthew Wilcox 153*14ebc28eSMatthew WilcoxThat avoids the spinlock in the common case where nothing has changed 154*14ebc28eSMatthew Wilcoxsince the last time it was checked. 155*14ebc28eSMatthew Wilcox 156*14ebc28eSMatthew WilcoxFunctions 157*14ebc28eSMatthew Wilcox========= 158*14ebc28eSMatthew Wilcox 159*14ebc28eSMatthew Wilcox.. kernel-doc:: lib/errseq.c 160