|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5 |
|
| #
e5a3b8cf |
| 29-Apr-2025 |
Kent Overstreet <[email protected]> |
bcachefs: More informative error message when shutting down due to error
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Revision tags: v6.15-rc4, v6.15-rc3 |
|
| #
417f01e7 |
| 18-Apr-2025 |
Kent Overstreet <[email protected]> |
bcachefs: Error ratelimiting is no longer only during fsck
We now more often do repair automatically, without the user invoking fsck - and sometimes that can involve fixing lots of errors, so let's
bcachefs: Error ratelimiting is no longer only during fsck
We now more often do repair automatically, without the user invoking fsck - and sometimes that can involve fixing lots of errors, so let's avoid flooding the dmesg log.
Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
|
Revision tags: v6.15-rc2, v6.15-rc1 |
|
| #
570f5050 |
| 02-Apr-2025 |
Bharadwaj Raju <[email protected]> |
bcachefs: use nonblocking variant of print_string_as_lines in error path
The inconsistency error path calls print_string_as_lines, which calls console_lock, which is a potentially-sleeping function
bcachefs: use nonblocking variant of print_string_as_lines in error path
The inconsistency error path calls print_string_as_lines, which calls console_lock, which is a potentially-sleeping function and so can't be called in an atomic context.
Replace calls to it with the nonblocking variant which is safe to call.
Signed-off-by: Bharadwaj Raju <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
| #
b2ffadcc |
| 01-Apr-2025 |
Kent Overstreet <[email protected]> |
bcachefs: Fix scheduling while atomic from logging changes
Two fixes from the recent logging changes:
bch2_inconsistent(), bch2_fs_inconsistent() be called from interrupt context, or with rcu_read_
bcachefs: Fix scheduling while atomic from logging changes
Two fixes from the recent logging changes:
bch2_inconsistent(), bch2_fs_inconsistent() be called from interrupt context, or with rcu_read_lock() held.
The one syzbot found is in bch2_bkey_pick_read_device bch2_dev_rcu bch2_fs_inconsistent
We're starting to switch to lift the printbufs up to higher levels so we can emit better log messages and print them all in one go (avoid garbling), so that conversion will help with spotting these in the future; when we declare a printbuf it must be flagged if we're in an atomic context.
Secondly, in btree_node_write_endio:
00085 BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321 00085 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 618, name: bch-reclaim/fa6 00085 preempt_count: 10001, expected: 0 00085 RCU nest depth: 0, expected: 0 00085 4 locks held by bch-reclaim/fa6/618: 00085 #0: ffffff80d7ccad68 (&j->reclaim_lock){+.+.}-{4:4}, at: bch2_journal_reclaim_thread+0x84/0x198 00085 #1: ffffff80d7c84218 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x1c0/0x440 00085 #2: ffffff80cd3f8140 (bcachefs_btree){+.+.}-{0:0}, at: __bch2_trans_get+0x22c/0x440 00085 #3: ffffff80c3823c20 (&vblk->vqs[i].lock){-.-.}-{3:3}, at: virtblk_done+0x58/0x130 00085 irq event stamp: 328 00085 hardirqs last enabled at (327): [<ffffffc080073a14>] finish_task_switch.isra.0+0xbc/0x2a0 00085 hardirqs last disabled at (328): [<ffffffc080971a10>] el1_interrupt+0x20/0x60 00085 softirqs last enabled at (0): [<ffffffc08002f920>] copy_process+0x7c8/0x2118 00085 softirqs last disabled at (0): [<0000000000000000>] 0x0 00085 Preemption disabled at: 00085 [<ffffffc08003ada0>] irq_enter_rcu+0x18/0x90 00085 CPU: 8 UID: 0 PID: 618 Comm: bch-reclaim/fa6 Not tainted 6.14.0-rc6-ktest-g04630bde23e8 #18798 00085 Hardware name: linux,dummy-virt (DT) 00085 Call trace: 00085 show_stack+0x1c/0x30 (C) 00085 dump_stack_lvl+0x84/0xc0 00085 dump_stack+0x14/0x20 00085 __might_resched+0x180/0x288 00085 __might_sleep+0x4c/0x88 00085 __kmalloc_node_track_caller_noprof+0x34c/0x3e0 00085 krealloc_noprof+0x1a0/0x2d8 00085 bch2_printbuf_make_room+0x9c/0x120 00085 bch2_prt_printf+0x60/0x1b8 00085 btree_node_write_endio+0x1b0/0x2d8 00085 bio_endio+0x138/0x1f0 00085 btree_node_write_endio+0xe8/0x2d8 00085 bio_endio+0x138/0x1f0 00085 blk_update_request+0x220/0x4c0 00085 blk_mq_end_request+0x28/0x148 00085 virtblk_request_done+0x64/0xe8 00085 blk_mq_complete_request+0x34/0x40 00085 virtblk_done+0x78/0x130 00085 vring_interrupt+0x6c/0xb0 00085 __handle_irq_event_percpu+0x8c/0x2e0 00085 handle_irq_event+0x50/0xb0 00085 handle_fasteoi_irq+0xc4/0x250 00085 handle_irq_desc+0x44/0x60 00085 generic_handle_domain_irq+0x20/0x30 00085 gic_handle_irq+0x54/0xc8 00085 call_on_irq_stack+0x24/0x40
Reported-by: [email protected] Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
| #
6d77ce4a |
| 26-Mar-2025 |
Kent Overstreet <[email protected]> |
bcachefs: Better printing of inconsistency errors
Build up and emit the error message for an inconsistency error all at once, instead of spread over multiple printk calls, so they're not jumbled in
bcachefs: Better printing of inconsistency errors
Build up and emit the error message for an inconsistency error all at once, instead of spread over multiple printk calls, so they're not jumbled in the dmesg log.
Also, add better indenting.
Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
| #
7337f9f1 |
| 28-Mar-2025 |
Kent Overstreet <[email protected]> |
bcachefs: bch2_count_fsck_err()
Factor out a helper from __bch2_fsck_err(), for counting the error in the superblock and deciding whether to print or ratelimit - will be used to replace some log_fsc
bcachefs: bch2_count_fsck_err()
Factor out a helper from __bch2_fsck_err(), for counting the error in the superblock and deciding whether to print or ratelimit - will be used to replace some log_fsck_err() calls, where we want to lift out printing the error message.
Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
| #
b00750c2 |
| 28-Mar-2025 |
Kent Overstreet <[email protected]> |
bcachefs: Better helpers for inconsistency errors
An inconsistency error often happens as part of an event with multiple error messages, and we want to build up one single error message with proper
bcachefs: Better helpers for inconsistency errors
An inconsistency error often happens as part of an event with multiple error messages, and we want to build up one single error message with proper indenting to produce more readable log messages that don't get garbled.
Add new helpers that emit messages to a printbuf instead of printing them directly, next patch will convert to use them.
Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
| #
1ece5323 |
| 26-Mar-2025 |
Kent Overstreet <[email protected]> |
bcachefs: Consistent indentation of multiline fsck errors
Add the new helper printbuf_indent_add_nextline(), and use it in __bch2_fsck_err() to centralize setting the indentation of multiline fsck e
bcachefs: Consistent indentation of multiline fsck errors
Add the new helper printbuf_indent_add_nextline(), and use it in __bch2_fsck_err() to centralize setting the indentation of multiline fsck errors.
Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
|
Revision tags: v6.14 |
|
| #
4fcd4de0 |
| 20-Mar-2025 |
Kent Overstreet <[email protected]> |
bcachefs: fs-common.c -> namei.c
name <-> inode, code for managing the relationships between inodes and dirents.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Revision tags: v6.14-rc7, v6.14-rc6, v6.14-rc5 |
|
| #
981e3801 |
| 26-Feb-2025 |
Kent Overstreet <[email protected]> |
bcachefs: Kick devices out after too many write IO errors
We're improving our handling of write errors - we shouldn't write degraded data just because a write failed once, we should retry it (on oth
bcachefs: Kick devices out after too many write IO errors
We're improving our handling of write errors - we shouldn't write degraded data just because a write failed once, we should retry it (on other devices, if possible).
But for this to work, we need to kick devices out when they're only returning errors - otherwise those retries will loop infinitely.
This adds a configurable timeout - if writes are failing for too long, we'll set that device read-only.
In the future we should also implement more tracking and another knob for an "allowed error rate", so that we can kick out drives that are acting "unhealthy".
Another thing we'll want is a mechanism (likely in userspace) for bringing a device back in after a transient error - perhaps a cable was jiggled, or there was a controller reset.
After transient errors we also need a mechanism to walk (from the journal) recent btree updates that weren't flushed to that device and treat them as "degraded", since unflushed data may well not have been written. Out of scope for this patch, but becoming relevant.
Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3 |
|
| #
1ccbcd32 |
| 10-Feb-2025 |
Kent Overstreet <[email protected]> |
bcachefs: bch2_write_op_error() now prints info about data update
A user has been seeing the "error verifying existing checksum while rewriting existing data (memory corruption?)" error.
This gener
bcachefs: bch2_write_op_error() now prints info about data update
A user has been seeing the "error verifying existing checksum while rewriting existing data (memory corruption?)" error.
This generally indicates a hardware issue (and that may be the case here), but it might also indicate a bug, in which case we need more information to look for patterns.
Reported-by: Roland Vet <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc2 |
|
| #
06284963 |
| 07-Feb-2025 |
Kent Overstreet <[email protected]> |
bcachefs: bch2_inum_offset_err_msg_trans() no longer handles transaction restarts
we're starting to use error messages with paths in fsck_errors(), where we do not want nested transaction restart ha
bcachefs: bch2_inum_offset_err_msg_trans() no longer handles transaction restarts
we're starting to use error messages with paths in fsck_errors(), where we do not want nested transaction restart handling, so let's prepare for that.
Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
| #
45f0e6c8 |
| 07-Feb-2025 |
Kent Overstreet <[email protected]> |
bcachefs: bch2_indirect_extent_missing_error() prints path, not just inode number
We want all error messages converted to print paths, not just inode numbers - users want this information, and it sp
bcachefs: bch2_indirect_extent_missing_error() prints path, not just inode number
We want all error messages converted to print paths, not just inode numbers - users want this information, and it speeds up debugging too.
Auditing and converting all error messages is going to be a big project, so for the moment we're just doing this incrementally.
Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2 |
|
| #
60558d55 |
| 08-Dec-2024 |
Kent Overstreet <[email protected]> |
bcachefs: Plumb bkey_validate_context to journal_entry_validate
This lets us print the exact location in the journal if it was found in the journal, or correctly print if it was found in the superbl
bcachefs: Plumb bkey_validate_context to journal_entry_validate
This lets us print the exact location in the journal if it was found in the journal, or correctly print if it was found in the superblock.
Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc1 |
|
| #
1302eeb7 |
| 29-Nov-2024 |
Kent Overstreet <[email protected]> |
bcachefs: bkey_fsck_err now respects errors_silent
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Revision tags: v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
| #
f7727a67 |
| 28-Sep-2024 |
Kent Overstreet <[email protected]> |
bcachefs: bch2_inum_to_path()
Add a function for walking backpointers to find a path from a given inode number, and convert various error messages to use it.
Signed-off-by: Kent Overstreet <kent.ov
bcachefs: bch2_inum_to_path()
Add a function for walking backpointers to find a path from a given inode number, and convert various error messages to use it.
Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
| #
8b105909 |
| 28-Nov-2024 |
Kent Overstreet <[email protected]> |
bcachefs: do_fsck_ask_yn()
__bch2_fsck_err() is huge, and badly needs more refactoring
Signed-off-by: Kent Overstreet <[email protected]>
|
| #
052210c3 |
| 28-Nov-2024 |
Kent Overstreet <[email protected]> |
bcachefs: Don't error out when logging fsck error
Signed-off-by: Kent Overstreet <[email protected]>
|
| #
9963a14d |
| 27-Nov-2024 |
Kent Overstreet <[email protected]> |
bcachefs: BCH_FS_recovery_running
If we're autofixing topology errors, we shouldn't shutdown if we're still in recovery.
Signed-off-by: Kent Overstreet <[email protected]>
|
| #
a6f4794f |
| 27-Nov-2024 |
Kent Overstreet <[email protected]> |
bcachefs: struct bkey_validate_context
Add a new parameter to bkey validate functions, and use it to improve invalid bkey error messages: we can now print the btree and depth it came from, or if it
bcachefs: struct bkey_validate_context
Add a new parameter to bkey validate functions, and use it to improve invalid bkey error messages: we can now print the btree and depth it came from, or if it came from the journal, or is a btree root.
Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
| #
c8e58813 |
| 27-Oct-2024 |
Kent Overstreet <[email protected]> |
bcachefs: bch2_bucket_do_index(): inconsistent_err -> fsck_err
Factor out a common helper, need_discard_or_freespace_err(), which is now used by both fsck and the runtime checks, and can repair.
Si
bcachefs: bch2_bucket_do_index(): inconsistent_err -> fsck_err
Factor out a common helper, need_discard_or_freespace_err(), which is now used by both fsck and the runtime checks, and can repair.
Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
| #
eb73e777 |
| 29-Oct-2024 |
Kent Overstreet <[email protected]> |
bcachefs: Kill FSCK_NEED_FSCK
If we find an error that indicates that we need to run fsck, we can specify that directly with run_explicit_recovery_pass().
These are now log_fsck_err() calls: we're
bcachefs: Kill FSCK_NEED_FSCK
If we find an error that indicates that we need to run fsck, we can specify that directly with run_explicit_recovery_pass().
These are now log_fsck_err() calls: we're just logging in the superblock that an error occurred - and possibly doing an emergency shutdown, depending on policy.
Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|
| #
eb5db64c |
| 19-Oct-2024 |
Kent Overstreet <[email protected]> |
bcachefs: Fix __bch2_fsck_err() warning
We only warn about having a btree_trans that wasn't passed in if we'll be prompting.
Signed-off-by: Kent Overstreet <[email protected]>
|
| #
492e24d7 |
| 01-Oct-2024 |
Kent Overstreet <[email protected]> |
bcachefs: Make sure we print error that causes fsck to bail out
Signed-off-by: Kent Overstreet <[email protected]>
|
| #
658c82f4 |
| 04-Oct-2024 |
Kent Overstreet <[email protected]> |
bcachefs: bkey errors are only AUTOFIX during read
Newly generated keys, in the transaction commit path or write path, should not be AUTOFIX; those indicate bugs that we need to fail fast for.
Fixe
bcachefs: bkey errors are only AUTOFIX during read
Newly generated keys, in the transaction commit path or write path, should not be AUTOFIX; those indicate bugs that we need to fail fast for.
Fixes: 5612daafb764 ("bcachefs: Fix fsck warnings from bkey validation") Signed-off-by: Kent Overstreet <[email protected]>
show more ...
|