History log of /linux-6.15/drivers/md/bcache/request.c (Results 1 – 25 of 147)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2
# 05356938 28-May-2024 Coly Li <[email protected]>

bcache: call force_wake_up_gc() if necessary in check_should_bypass()

If there are extreme heavy write I/O continuously hit on relative small
cache device (512GB in my testing), it is possible to ma

bcache: call force_wake_up_gc() if necessary in check_should_bypass()

If there are extreme heavy write I/O continuously hit on relative small
cache device (512GB in my testing), it is possible to make counter
c->gc_stats.in_use continue to increase and exceed CUTOFF_CACHE_ADD.

If 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, all following write
requests will bypass the cache device because check_should_bypass()
returns 'true'. Because all writes bypass the cache device, counter
c->sectors_to_gc has no chance to be negative value, and garbage
collection thread won't be waken up even the whole cache becomes clean
after writeback accomplished. The aftermath is that all write I/Os go
directly into backing device even the cache device is clean.

To avoid the above situation, this patch uses a quite conservative way
to fix: if 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, only wakes
up garbage collection thread when the whole cache device is clean.

Before the fix, the writes-always-bypass situation happens after 10+
hours write I/O pressure on 512GB Intel optane memory which acts as
cache device. After this fix, such situation doesn't happen after 36+
hours testing.

Signed-off-by: Coly Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2
# d4e3b928 18-Nov-2023 Kent Overstreet <[email protected]>

closures: CLOSURE_CALLBACK() to fix type punning

Control flow integrity is now checking that type signatures match on
indirect function calls. That breaks closures, which embed a work_struct
in a cl

closures: CLOSURE_CALLBACK() to fix type punning

Control flow integrity is now checking that type signatures match on
indirect function calls. That breaks closures, which embed a work_struct
in a closure in such a way that a closure_fn may also be used as a
workqueue fn by the underlying closure code.

So we have to change closure fns to take a work_struct as their
argument - but that results in a loss of clarity, as closure fns have
different semantics from normal workqueue functions (they run owning a
ref on the closure, which must be released with continue_at() or
closure_return()).

Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros
as suggested by Kees, to smooth things over a bit.

Suggested-by: Kees Cook <[email protected]>
Cc: Coly Li <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>

show more ...


Revision tags: v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6
# 05bdb996 08-Jun-2023 Christoph Hellwig <[email protected]>

block: replace fmode_t with a block-specific type for block open flags

The only overlap between the block open flags mapped into the fmode_t and
other uses of fmode_t are FMODE_READ and FMODE_WRITE.

block: replace fmode_t with a block-specific type for block open flags

The only overlap between the block open flags mapped into the fmode_t and
other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new
blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and
->ioctl and stop abusing fmode_t.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Jack Wang <[email protected]> [rnbd]
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1
# c34b7ac6 06-Dec-2022 Christoph Hellwig <[email protected]>

block: remove bio_set_op_attrs

This macro is obsolete, so replace the last few uses with open coded
bi_opf assignments.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Coly Li <colyli@suse.

block: remove bio_set_op_attrs

This macro is obsolete, so replace the last few uses with open coded
bi_opf assignments.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Coly Li <[email protected] <mailto:[email protected]>>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1
# 8032bf12 10-Oct-2022 Jason A. Donenfeld <[email protected]>

treewide: use get_random_u32_below() instead of deprecated function

This is a simple mechanical transformation done by:

@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
(E)

Reviewed-

treewide: use get_random_u32_below() instead of deprecated function

This is a simple mechanical transformation done by:

@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
(E)

Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Acked-by: Darrick J. Wong <[email protected]> # for xfs
Reviewed-by: SeongJae Park <[email protected]> # for damon
Reviewed-by: Jason Gunthorpe <[email protected]> # for infiniband
Reviewed-by: Russell King (Oracle) <[email protected]> # for arm
Acked-by: Ulf Hansson <[email protected]> # for mmc
Signed-off-by: Jason A. Donenfeld <[email protected]>

show more ...


# 81895a65 05-Oct-2022 Jason A. Donenfeld <[email protected]>

treewide: use prandom_u32_max() when possible, part 1

Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
t

treewide: use prandom_u32_max() when possible, part 1

Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

- RAND = get_random_u32();
... when != RAND
- RAND %= (E);
+ RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

((T)get_random_u32()@p & (LITERAL))

// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
print("Skipping 0x%x for cleanup elsewhere" % (value))
cocci.include_match(False)
elif value & (value + 1) != 0:
print("Skipping 0x%x because it's not a power of two minus one" % (value))
cocci.include_match(False)
elif literal.startswith('0x'):
coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

- (FUNC()@p & (LITERAL))
+ prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

{
- T VAR;
- VAR = (E);
- return VAR;
+ return E;
}

@drop_var@
type T;
identifier VAR;
@@

{
- T VAR;
... when != VAR
}

Reviewed-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Yury Norov <[email protected]>
Reviewed-by: KP Singh <[email protected]>
Reviewed-by: Jan Kara <[email protected]> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <[email protected]> # for drbd
Acked-by: Jakub Kicinski <[email protected]>
Acked-by: Heiko Carstens <[email protected]> # for s390
Acked-by: Ulf Hansson <[email protected]> # for mmc
Acked-by: Darrick J. Wong <[email protected]> # for xfs
Signed-off-by: Jason A. Donenfeld <[email protected]>

show more ...


Revision tags: v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1
# 40f567bb 27-May-2022 Jia-Ju Bai <[email protected]>

md: bcache: check the return value of kzalloc() in detached_dev_do_request()

The function kzalloc() in detached_dev_do_request() can fail, so its
return value should be checked.

Fixes: bc082a55d25c

md: bcache: check the return value of kzalloc() in detached_dev_do_request()

The function kzalloc() in detached_dev_do_request() can fail, so its
return value should be checked.

Fixes: bc082a55d25c ("bcache: fix inaccurate io state for detached bcache devices")
Reported-by: TOTE Robot <[email protected]>
Signed-off-by: Jia-Ju Bai <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4
# 9dca4168 19-Apr-2022 Coly Li <[email protected]>

bcache: fix wrong bdev parameter when calling bio_alloc_clone() in do_bio_hook()

Commit abfc426d1b2f ("block: pass a block_device to bio_clone_fast")
calls the modified bio_alloc_clone() in bcache c

bcache: fix wrong bdev parameter when calling bio_alloc_clone() in do_bio_hook()

Commit abfc426d1b2f ("block: pass a block_device to bio_clone_fast")
calls the modified bio_alloc_clone() in bcache code as:
bio_init_clone(bio->bi_bdev, bio, orig_bio, GFP_NOIO);

But the first parameter is wrong, where bio->bi_bdev should be
orig_bio->bi_bdev. The wrong bi_bdev panics the kernel when submitting
cache bio.

This patch fixes the wrong bdev parameter usage and avoid the panic.

Fixes: abfc426d1b2f ("block: pass a block_device to bio_clone_fast")
Signed-off-by: Coly Li <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Mike Snitzer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v5.18-rc3
# 70200574 15-Apr-2022 Christoph Hellwig <[email protected]>

block: remove QUEUE_FLAG_DISCARD

Just use a non-zero max_discard_sectors as an indicator for discard
support, similar to what is done for write zeroes.

The only places where needs special attention

block: remove QUEUE_FLAG_DISCARD

Just use a non-zero max_discard_sectors as an indicator for discard
support, similar to what is done for write zeroes.

The only places where needs special attention is the RAID5 driver,
which must clear discard support for security reasons by default,
even if the default stacking rules would allow for it.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Acked-by: Christoph Böhmwalder <[email protected]> [drbd]
Acked-by: Jan Höppner <[email protected]> [s390]
Acked-by: Coly Li <[email protected]> [bcache]
Acked-by: David Sterba <[email protected]> [btrfs]
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7
# 07fee7ab 03-Mar-2022 Christoph Hellwig <[email protected]>

bcache: use bvec_kmap_local in bio_csum

Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Signed-off-by: Christoph Hellwig <hch

bcache: use bvec_kmap_local in bio_csum

Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3
# abfc426d 02-Feb-2022 Christoph Hellwig <[email protected]>

block: pass a block_device to bio_clone_fast

Pass a block_device to bio_clone_fast and __bio_clone_fast and give
the functions more suitable names.

Signed-off-by: Christoph Hellwig <[email protected]>
Rev

block: pass a block_device to bio_clone_fast

Pass a block_device to bio_clone_fast and __bio_clone_fast and give
the functions more suitable names.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Mike Snitzer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# a0e8de79 02-Feb-2022 Christoph Hellwig <[email protected]>

block: initialize the target bio in __bio_clone_fast

All callers of __bio_clone_fast initialize the bio first. Move that
initialization into __bio_clone_fast instead.

Signed-off-by: Christoph Hell

block: initialize the target bio in __bio_clone_fast

All callers of __bio_clone_fast initialize the bio first. Move that
initialization into __bio_clone_fast instead.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Mike Snitzer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 56b4b5ab 02-Feb-2022 Christoph Hellwig <[email protected]>

block: clone crypto and integrity data in __bio_clone_fast

__bio_clone_fast should also clone integrity and crypto data, as a clone
without those is incomplete. Right now the only caller that can a

block: clone crypto and integrity data in __bio_clone_fast

__bio_clone_fast should also clone integrity and crypto data, as a clone
without those is incomplete. Right now the only caller that can actually
support crypto and integrity data (dm) does it manually for the one
callchain that supports these, but we better do it properly in the core.

Note that all callers except for the above mentioned one also don't need
to handle failure at all, given that the integrity and crypto clones are
based on mempool allocations that won't fail for sleeping allocations.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Mike Snitzer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v5.17-rc2
# a7c50c94 24-Jan-2022 Christoph Hellwig <[email protected]>

block: pass a block_device and opf to bio_reset

Pass the block_device that we plan to use this bio for and the
operation to bio_reset to optimize the assigment. A NULL block_device
can be passed, b

block: pass a block_device and opf to bio_reset

Pass the block_device that we plan to use this bio for and the
operation to bio_reset to optimize the assigment. A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 49add496 24-Jan-2022 Christoph Hellwig <[email protected]>

block: pass a block_device and opf to bio_init

Pass the block_device that we plan to use this bio for and the
operation to bio_init to optimize the assignment. A NULL block_device
can be passed, bo

block: pass a block_device and opf to bio_init

Pass the block_device that we plan to use this bio for and the
operation to bio_init to optimize the assignment. A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 609be106 24-Jan-2022 Christoph Hellwig <[email protected]>

block: pass a block_device and opf to bio_alloc_bioset

Pass the block_device and operation that we plan to use this bio for to
bio_alloc_bioset to optimize the assigment. NULL/0 can be passed, both

block: pass a block_device and opf to bio_alloc_bioset

Pass the block_device and operation that we plan to use this bio for to
bio_alloc_bioset to optimize the assigment. NULL/0 can be passed, both
for the passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.

Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7
# 39fa7a95 20-Oct-2021 Christoph Hellwig <[email protected]>

bcache: remove bch_crc64_update

bch_crc64_update is an entirely pointless wrapper around crc64_be.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Link: https:

bcache: remove bch_crc64_update

bch_crc64_update is an entirely pointless wrapper around crc64_be.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 0f5cd781 20-Oct-2021 Christoph Hellwig <[email protected]>

bcache: remove the backing_dev_name field from struct cached_dev

Just use the %pg format specifier to print the name directly.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Coly Li <

bcache: remove the backing_dev_name field from struct cached_dev

Just use the %pg format specifier to print the name directly.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v5.15-rc6
# 3e08773c 12-Oct-2021 Christoph Hellwig <[email protected]>

block: switch polling to be bio based

Replace the blk_poll interface that requires the caller to keep a queue
and cookie from the submissions with polling based on the bio.

Polling for the bio itse

block: switch polling to be bio based

Replace the blk_poll interface that requires the caller to keep a queue
and cookie from the submissions with polling based on the bio.

Polling for the bio itself leads to a few advantages:

- the cookie construction can made entirely private in blk-mq.c
- the caller does not need to remember the request_queue and cookie
separately and thus sidesteps their lifetime issues
- keeping the device and the cookie inside the bio allows to trivially
support polling BIOs remapping by stacking drivers
- a lot of code to propagate the cookie back up the submission path can
be removed entirely.

Signed-off-by: Christoph Hellwig <[email protected]>
Tested-by: Mark Wunderlich <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6
# 41fe8d08 07-Jun-2021 Coly Li <[email protected]>

bcache: avoid oversized read request in cache missing code path

In the cache missing code path of cached device, if a proper location
from the internal B+ tree is matched for a cache miss range, fun

bcache: avoid oversized read request in cache missing code path

In the cache missing code path of cached device, if a proper location
from the internal B+ tree is matched for a cache miss range, function
cached_dev_cache_miss() will be called in cache_lookup_fn() in the
following code block,
[code block 1]
526 unsigned int sectors = KEY_INODE(k) == s->iop.inode
527 ? min_t(uint64_t, INT_MAX,
528 KEY_START(k) - bio->bi_iter.bi_sector)
529 : INT_MAX;
530 int ret = s->d->cache_miss(b, s, bio, sectors);

Here s->d->cache_miss() is the call backfunction pointer initialized as
cached_dev_cache_miss(), the last parameter 'sectors' is an important
hint to calculate the size of read request to backing device of the
missing cache data.

Current calculation in above code block may generate oversized value of
'sectors', which consequently may trigger 2 different potential kernel
panics by BUG() or BUG_ON() as listed below,

1) BUG_ON() inside bch_btree_insert_key(),
[code block 2]
886 BUG_ON(b->ops->is_extents && !KEY_SIZE(k));
2) BUG() inside biovec_slab(),
[code block 3]
51 default:
52 BUG();
53 return NULL;

All the above panics are original from cached_dev_cache_miss() by the
oversized parameter 'sectors'.

Inside cached_dev_cache_miss(), parameter 'sectors' is used to calculate
the size of data read from backing device for the cache missing. This
size is stored in s->insert_bio_sectors by the following lines of code,
[code block 4]
909 s->insert_bio_sectors = min(sectors, bio_sectors(bio) + reada);

Then the actual key inserting to the internal B+ tree is generated and
stored in s->iop.replace_key by the following lines of code,
[code block 5]
911 s->iop.replace_key = KEY(s->iop.inode,
912 bio->bi_iter.bi_sector + s->insert_bio_sectors,
913 s->insert_bio_sectors);
The oversized parameter 'sectors' may trigger panic 1) by BUG_ON() from
the above code block.

And the bio sending to backing device for the missing data is allocated
with hint from s->insert_bio_sectors by the following lines of code,
[code block 6]
926 cache_bio = bio_alloc_bioset(GFP_NOWAIT,
927 DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS),
928 &dc->disk.bio_split);
The oversized parameter 'sectors' may trigger panic 2) by BUG() from the
agove code block.

Now let me explain how the panics happen with the oversized 'sectors'.
In code block 5, replace_key is generated by macro KEY(). From the
definition of macro KEY(),
[code block 7]
71 #define KEY(inode, offset, size) \
72 ((struct bkey) { \
73 .high = (1ULL << 63) | ((__u64) (size) << 20) | (inode), \
74 .low = (offset) \
75 })

Here 'size' is 16bits width embedded in 64bits member 'high' of struct
bkey. But in code block 1, if "KEY_START(k) - bio->bi_iter.bi_sector" is
very probably to be larger than (1<<16) - 1, which makes the bkey size
calculation in code block 5 is overflowed. In one bug report the value
of parameter 'sectors' is 131072 (= 1 << 17), the overflowed 'sectors'
results the overflowed s->insert_bio_sectors in code block 4, then makes
size field of s->iop.replace_key to be 0 in code block 5. Then the 0-
sized s->iop.replace_key is inserted into the internal B+ tree as cache
missing check key (a special key to detect and avoid a racing between
normal write request and cache missing read request) as,
[code block 8]
915 ret = bch_btree_insert_check_key(b, &s->op, &s->iop.replace_key);

Then the 0-sized s->iop.replace_key as 3rd parameter triggers the bkey
size check BUG_ON() in code block 2, and causes the kernel panic 1).

Another kernel panic is from code block 6, is by the bvecs number
oversized value s->insert_bio_sectors from code block 4,
min(sectors, bio_sectors(bio) + reada)
There are two possibility for oversized reresult,
- bio_sectors(bio) is valid, but bio_sectors(bio) + reada is oversized.
- sectors < bio_sectors(bio) + reada, but sectors is oversized.

From a bug report the result of "DIV_ROUND_UP(s->insert_bio_sectors,
PAGE_SECTORS)" from code block 6 can be 344, 282, 946, 342 and many
other values which larther than BIO_MAX_VECS (a.k.a 256). When calling
bio_alloc_bioset() with such larger-than-256 value as the 2nd parameter,
this value will eventually be sent to biovec_slab() as parameter
'nr_vecs' in following code path,
bio_alloc_bioset() ==> bvec_alloc() ==> biovec_slab()
Because parameter 'nr_vecs' is larger-than-256 value, the panic by BUG()
in code block 3 is triggered inside biovec_slab().

From the above analysis, we know that the 4th parameter 'sector' sent
into cached_dev_cache_miss() may cause overflow in code block 5 and 6,
and finally cause kernel panic in code block 2 and 3. And if result of
bio_sectors(bio) + reada exceeds valid bvecs number, it may also trigger
kernel panic in code block 3 from code block 6.

Now the almost-useless readahead size for cache missing request back to
backing device is removed, this patch can fix the oversized issue with
more simpler method.
- add a local variable size_limit, set it by the minimum value from
the max bkey size and max bio bvecs number.
- set s->insert_bio_sectors by the minimum value from size_limit,
sectors, and the sectors size of bio.
- replace sectors by s->insert_bio_sectors to do bio_next_split.

By the above method with size_limit, s->insert_bio_sectors will never
result oversized replace_key size or bio bvecs number. And split bio
'miss' from bio_next_split() will always match the size of 'cache_bio',
that is the current maximum bio size we can sent to backing device for
fetching the cache missing data.

Current problmatic code can be partially found since Linux v3.13-rc1,
therefore all maintained stable kernels should try to apply this fix.

Reported-by: Alexander Ullrich <[email protected]>
Reported-by: Diego Ercolani <[email protected]>
Reported-by: Jan Szubiak <[email protected]>
Reported-by: Marco Rebhan <[email protected]>
Reported-by: Matthias Ferdinand <[email protected]>
Reported-by: Victor Westerhuis <[email protected]>
Reported-by: Vojtech Pavlik <[email protected]>
Reported-and-tested-by: Rolf Fokkens <[email protected]>
Reported-and-tested-by: Thorsten Knabe <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Cc: [email protected]
Cc: Christoph Hellwig <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Nix <[email protected]>
Cc: Takashi Iwai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 1616a4c2 07-Jun-2021 Coly Li <[email protected]>

bcache: remove bcache device self-defined readahead

For read cache missing, bcache defines a readahead size for the read I/O
request to the backing device for the missing data. This readahead size
i

bcache: remove bcache device self-defined readahead

For read cache missing, bcache defines a readahead size for the read I/O
request to the backing device for the missing data. This readahead size
is initialized to 0, and almost no one uses it to avoid unnecessary read
amplifying onto backing device and write amplifying onto cache device.
Considering upper layer file system code has readahead logic allready
and works fine with readahead_cache_policy sysfile interface, we don't
have to keep bcache self-defined readahead anymore.

This patch removes the bcache self-defined readahead for cache missing
request for backing device, and the readahead sysfs file interfaces are
removed as well.

This is the preparation for next patch to fix potential kernel panic due
to oversized request in a simpler method.

Reported-by: Alexander Ullrich <[email protected]>
Reported-by: Diego Ercolani <[email protected]>
Reported-by: Jan Szubiak <[email protected]>
Reported-by: Marco Rebhan <[email protected]>
Reported-by: Matthias Ferdinand <[email protected]>
Reported-by: Victor Westerhuis <[email protected]>
Reported-by: Vojtech Pavlik <[email protected]>
Reported-and-tested-by: Rolf Fokkens <[email protected]>
Reported-and-tested-by: Thorsten Knabe <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: [email protected]
Cc: Kent Overstreet <[email protected]>
Cc: Nix <[email protected]>
Cc: Takashi Iwai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5
# 99dfc43e 24-Jan-2021 Christoph Hellwig <[email protected]>

block: use ->bi_bdev for bio based I/O accounting

Rework the I/O accounting for bio based drivers to use ->bi_bdev. This
means all drivers can now simply use bio_start_io_acct to start
accounting,

block: use ->bi_bdev for bio based I/O accounting

Rework the I/O accounting for bio based drivers to use ->bi_bdev. This
means all drivers can now simply use bio_start_io_acct to start
accounting, and it will take partitions into account automatically. To
end I/O account either bio_end_io_acct can be used if the driver never
remaps I/O to a different device, or bio_end_io_acct_remapped if the
driver did remap the I/O.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 309dca30 24-Jan-2021 Christoph Hellwig <[email protected]>

block: store a block_device pointer in struct bio

Replace the gendisk pointer in struct bio with a pointer to the newly
improved struct block device. From that the gendisk can be trivially
accessed

block: store a block_device pointer in struct bio

Replace the gendisk pointer in struct bio with a pointer to the newly
improved struct block device. From that the gendisk can be trivially
accessed with an extra indirection, but it also allows to directly
look up all information related to partition remapping.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6
# 8446fe92 24-Nov-2020 Christoph Hellwig <[email protected]>

block: switch partition lookup to use struct block_device

Use struct block_device to lookup partitions on a disk. This removes
all usage of struct hd_struct from the I/O path.

Signed-off-by: Chris

block: switch partition lookup to use struct block_device

Use struct block_device to lookup partitions on a disk. This removes
all usage of struct hd_struct from the I/O path.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Acked-by: Coly Li <[email protected]> [bcache]
Acked-by: Chao Yu <[email protected]> [f2fs]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


Revision tags: v5.10-rc5, v5.10-rc4, v5.10-rc3
# a7cb3d2f 03-Nov-2020 Christoph Hellwig <[email protected]>

block: remove __blkdev_driver_ioctl

Just open code it in the few callers.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>


123456