| 3f77c3df | 21-Apr-2025 |
Shannon Nelson <[email protected]> |
pds_core: make wait_context part of q_info
Make the wait_context a full part of the q_info struct rather than a stack variable that goes away after pdsc_adminq_post() is done so that the context is
pds_core: make wait_context part of q_info
Make the wait_context a full part of the q_info struct rather than a stack variable that goes away after pdsc_adminq_post() is done so that the context is still available after the wait loop has given up.
There was a case where a slow development firmware caused the adminq request to time out, but then later the FW finally finished the request and sent the interrupt. The handler tried to complete_all() the completion context that had been created on the stack in pdsc_adminq_post() but no longer existed. This caused bad pointer usage, kernel crashes, and much wailing and gnashing of teeth.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| f9559d81 | 21-Apr-2025 |
Brett Creeley <[email protected]> |
pds_core: Remove unnecessary check in pds_client_adminq_cmd()
When the pds_core driver was first created there were some race conditions around using the adminq, especially for client drivers. To re
pds_core: Remove unnecessary check in pds_client_adminq_cmd()
When the pds_core driver was first created there were some race conditions around using the adminq, especially for client drivers. To reduce the possibility of a race condition there's a check against pf->state in pds_client_adminq_cmd(). This is problematic for a couple of reasons:
1. The PDSC_S_INITING_DRIVER bit is set during probe, but not cleared until after everything in probe is complete, which includes creating the auxiliary devices. For pds_fwctl this means it can't make any adminq commands until after pds_core's probe is complete even though the adminq is fully up by the time pds_fwctl's auxiliary device is created.
2. The race conditions around using the adminq have been fixed and this path is already protected against client drivers calling pds_client_adminq_cmd() if the adminq isn't ready, i.e. see pdsc_adminq_post() -> pdsc_adminq_inc_if_up().
Fix this by removing the pf->state check in pds_client_adminq_cmd() because invalid accesses to pds_core's adminq is already handled by pdsc_adminq_post()->pdsc_adminq_inc_if_up().
Fixes: 10659034c622 ("pds_core: add the aux client API") Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| 2567daad | 21-Apr-2025 |
Brett Creeley <[email protected]> |
pds_core: handle unsupported PDS_CORE_CMD_FW_CONTROL result
If the FW doesn't support the PDS_CORE_CMD_FW_CONTROL command the driver might at the least print garbage and at the worst crash when the
pds_core: handle unsupported PDS_CORE_CMD_FW_CONTROL result
If the FW doesn't support the PDS_CORE_CMD_FW_CONTROL command the driver might at the least print garbage and at the worst crash when the user runs the "devlink dev info" devlink command.
This happens because the stack variable fw_list is not 0 initialized which results in fw_list.num_fw_slots being a garbage value from the stack. Then the driver tries to access fw_list.fw_names[i] with i >= ARRAY_SIZE and runs off the end of the array.
Fix this by initializing the fw_list and by not failing completely if the devcmd fails because other useful information is printed via devlink dev info even if the devcmd fails.
Fixes: 45d76f492938 ("pds_core: set up device and adminq") Signed-off-by: Brett Creeley <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| b699bdc7 | 20-Mar-2025 |
Shannon Nelson <[email protected]> |
pds_core: specify auxiliary_device to be created
In preparation for adding a new auxiliary_device for the PF, make the vif type an argument to pdsc_auxbus_dev_add(). Pass in the address of the pade
pds_core: specify auxiliary_device to be created
In preparation for adding a new auxiliary_device for the PF, make the vif type an argument to pdsc_auxbus_dev_add(). Pass in the address of the padev pointer so that the caller can specify where to save it and keep the mutex usage within the function.
Link: https://patch.msgid.link/r/[email protected] Reviewed-by: Leon Romanovsky <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Reviewed-by: Dave Jiang <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
show more ...
|
| 2dac60e0 | 16-Feb-2024 |
Shannon Nelson <[email protected]> |
pds_core: delete VF dev on reset
When the VF is hit with a reset, remove the aux device in the prepare for reset and try to restore it after the reset. The userland mechanics will need to recover an
pds_core: delete VF dev on reset
When the VF is hit with a reset, remove the aux device in the prepare for reset and try to restore it after the reset. The userland mechanics will need to recover and rebuild whatever uses the device afterwards.
Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| 792d36cc | 02-Feb-2024 |
Brett Creeley <[email protected]> |
pds_core: Clean up init/uninit flows to be more readable
The setup and teardown flows are somewhat hard to follow regarding pdsc_core_init()/pdsc_dev_init() and their corresponding teardown flows be
pds_core: Clean up init/uninit flows to be more readable
The setup and teardown flows are somewhat hard to follow regarding pdsc_core_init()/pdsc_dev_init() and their corresponding teardown flows being in pdsc_teardown(). Improve the readability by adding new pdsc_core_uninit()/pdsc_dev_unint() functions that mirror their init counterparts. Also, move the notify and admin qcq allocations into pdsc_core_init(), so they can be freed in pdsc_core_uninit().
Signed-off-by: Brett Creeley <[email protected]> Reviewed-by: Shannon Nelson <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| 247c4ed0 | 02-Feb-2024 |
Brett Creeley <[email protected]> |
pds_core: Fix up some minor issues
Running xmastree.py against the driver found some RCT issues, so fix them.
Also, if allocating pdsc->intr_info in pdsc_dev_init() fails the driver still tries to
pds_core: Fix up some minor issues
Running xmastree.py against the driver found some RCT issues, so fix them.
Also, if allocating pdsc->intr_info in pdsc_dev_init() fails the driver still tries to free pdsc->intr_info. Fix this by just returning -ENOMEM since there's nothing to free at this point of failure.
Signed-off-by: Brett Creeley <[email protected]> Reviewed-by: Shannon Nelson <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| bca10f2c | 02-Feb-2024 |
Brett Creeley <[email protected]> |
pds_core: Unmask adminq interrupt in work thread
Unmasking the interrupt during the pdsc_adminq_isr is a bit early and could cause unnecessary interrupts. Instead always unmask after processing the
pds_core: Unmask adminq interrupt in work thread
Unmasking the interrupt during the pdsc_adminq_isr is a bit early and could cause unnecessary interrupts. Instead always unmask after processing the adminq and notifyq in pdsc_work_thread()->pdsc_process_adminq(). Also, since we are always unmasking, there's no need for the local credits variable in pdsc_process_adminq().
Signed-off-by: Brett Creeley <[email protected]> Reviewed-by: Shannon Nelson <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
| bc90fbe0 | 29-Jan-2024 |
Brett Creeley <[email protected]> |
pds_core: Rework teardown/setup flow to be more common
Currently the teardown/setup flow for driver probe/remove is quite a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up(). One key
pds_core: Rework teardown/setup flow to be more common
Currently the teardown/setup flow for driver probe/remove is quite a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up(). One key piece that's missing are the calls to pci_alloc_irq_vectors() and pci_free_irq_vectors(). The pcie reset case is calling pci_free_irq_vectors() on reset_prepare, but not calling the corresponding pci_alloc_irq_vectors() on reset_done. This is causing unexpected/unwanted interrupt behavior due to the adminq interrupt being accidentally put into legacy interrupt mode. Also, the pci_alloc_irq_vectors()/pci_free_irq_vectors() functions are being called directly in probe/remove respectively.
Fix this inconsistency by making the following changes: 1. Always call pdsc_dev_init() in pdsc_setup(), which calls pci_alloc_irq_vectors() and get rid of the now unused pds_dev_reinit(). 2. Always free/clear the pdsc->intr_info in pdsc_teardown() since this structure will get re-alloced in pdsc_setup(). 3. Move the calls of pci_free_irq_vectors() to pdsc_teardown() since pci_alloc_irq_vectors() will always be called in pdsc_setup()->pdsc_dev_init() for both the probe/remove and reset flows. 4. Make sure to only create the debugfs "identity" entry when it doesn't already exist, which it will in the reset case because it's already been created in the initial call to pdsc_dev_init().
Fixes: ffa55858330f ("pds_core: implement pci reset handlers") Signed-off-by: Brett Creeley <[email protected]> Reviewed-by: Shannon Nelson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| e96094c1 | 29-Jan-2024 |
Brett Creeley <[email protected]> |
pds_core: Clear BARs on reset
During reset the BARs might be accessed when they are unmapped. This can cause unexpected issues, so fix it by clearing the cached BAR values so they are not accessed u
pds_core: Clear BARs on reset
During reset the BARs might be accessed when they are unmapped. This can cause unexpected issues, so fix it by clearing the cached BAR values so they are not accessed until they are re-mapped.
Also, make sure any places that can access the BARs when they are NULL are prevented.
Fixes: 49ce92fbee0b ("pds_core: add FW update feature to devlink") Signed-off-by: Brett Creeley <[email protected]> Reviewed-by: Shannon Nelson <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| 7e82a874 | 29-Jan-2024 |
Brett Creeley <[email protected]> |
pds_core: Prevent race issues involving the adminq
There are multiple paths that can result in using the pdsc's adminq.
[1] pdsc_adminq_isr and the resulting work from queue_work(), i.e. pdsc_w
pds_core: Prevent race issues involving the adminq
There are multiple paths that can result in using the pdsc's adminq.
[1] pdsc_adminq_isr and the resulting work from queue_work(), i.e. pdsc_work_thread()->pdsc_process_adminq()
[2] pdsc_adminq_post()
When the device goes through reset via PCIe reset and/or a fw_down/fw_up cycle due to bad PCIe state or bad device state the adminq is destroyed and recreated.
A NULL pointer dereference can happen if [1] or [2] happens after the adminq is already destroyed.
In order to fix this, add some further state checks and implement reference counting for adminq uses. Reference counting was used because multiple threads can attempt to access the adminq at the same time via [1] or [2]. Additionally, multiple clients (i.e. pds-vfio-pci) can be using [2] at the same time.
The adminq_refcnt is initialized to 1 when the adminq has been allocated and is ready to use. Users/clients of the adminq (i.e. [1] and [2]) will increment the refcnt when they are using the adminq. When the driver goes into a fw_down cycle it will set the PDSC_S_FW_DEAD bit and then wait for the adminq_refcnt to hit 1. Setting the PDSC_S_FW_DEAD before waiting will prevent any further adminq_refcnt increments. Waiting for the adminq_refcnt to hit 1 allows for any current users of the adminq to finish before the driver frees the adminq. Once the adminq_refcnt hits 1 the driver clears the refcnt to signify that the adminq is deleted and cannot be used. On the fw_up cycle the driver will once again initialize the adminq_refcnt to 1 allowing the adminq to be used again.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <[email protected]> Reviewed-by: Shannon Nelson <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|