|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5 |
|
| #
f7a70d65 |
| 14-Feb-2024 |
Tobias Waldekranz <[email protected]> |
net: bridge: switchdev: Ensure deferred event delivery on unoffload
When unoffloading a device, it is important to ensure that all relevant deferred events are delivered to it before it disassociate
net: bridge: switchdev: Ensure deferred event delivery on unoffload
When unoffloading a device, it is important to ensure that all relevant deferred events are delivered to it before it disassociates itself from the bridge.
Before this change, this was true for the normal case when a device maps 1:1 to a net_bridge_port, i.e.
br0 / swp0
When swp0 leaves br0, the call to switchdev_deferred_process() in del_nbp() makes sure to process any outstanding events while the device is still associated with the bridge.
In the case when the association is indirect though, i.e. when the device is attached to the bridge via an intermediate device, like a LAG...
br0 / lag0 / swp0
...then detaching swp0 from lag0 does not cause any net_bridge_port to be deleted, so there was no guarantee that all events had been processed before the device disassociated itself from the bridge.
Fix this by always synchronously processing all deferred events before signaling completion of unoffloading back to the driver.
Fixes: 4e51bf44a03a ("net: bridge: move the switchdev object replay helpers to "push" mode") Signed-off-by: Tobias Waldekranz <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
dc489f86 |
| 14-Feb-2024 |
Tobias Waldekranz <[email protected]> |
net: bridge: switchdev: Skip MDB replays of deferred events on offload
Before this change, generation of the list of MDB events to replay would race against the creation of new group memberships, ei
net: bridge: switchdev: Skip MDB replays of deferred events on offload
Before this change, generation of the list of MDB events to replay would race against the creation of new group memberships, either from the IGMP/MLD snooping logic or from user configuration.
While new memberships are immediately visible to walkers of br->mdb_list, the notification of their existence to switchdev event subscribers is deferred until a later point in time. So if a replay list was generated during a time that overlapped with such a window, it would also contain a replay of the not-yet-delivered event.
The driver would thus receive two copies of what the bridge internally considered to be one single event. On destruction of the bridge, only a single membership deletion event was therefore sent. As a consequence of this, drivers which reference count memberships (at least DSA), would be left with orphan groups in their hardware database when the bridge was destroyed.
This is only an issue when replaying additions. While deletion events may still be pending on the deferred queue, they will already have been removed from br->mdb_list, so no duplicates can be generated in that scenario.
To a user this meant that old group memberships, from a bridge in which a port was previously attached, could be reanimated (in hardware) when the port joined a new bridge, without the new bridge's knowledge.
For example, on an mv88e6xxx system, create a snooping bridge and immediately add a port to it:
root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \ > ip link set dev x3 up master br0
And then destroy the bridge:
root@infix-06-0b-00:~$ ip link del dev br0 root@infix-06-0b-00:~$ mvls atu ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a DEV:0 Marvell 88E6393X 33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . . 33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . . ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a root@infix-06-0b-00:~$
The two IPv6 groups remain in the hardware database because the port (x3) is notified of the host's membership twice: once via the original event and once via a replay. Since only a single delete notification is sent, the count remains at 1 when the bridge is destroyed.
Then add the same port (or another port belonging to the same hardware domain) to a new bridge, this time with snooping disabled:
root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \ > ip link set dev x3 up master br1
All multicast, including the two IPv6 groups from br0, should now be flooded, according to the policy of br1. But instead the old memberships are still active in the hardware database, causing the switch to only forward traffic to those groups towards the CPU (port 0).
Eliminate the race in two steps:
1. Grab the write-side lock of the MDB while generating the replay list.
This prevents new memberships from showing up while we are generating the replay list. But it leaves the scenario in which a deferred event was already generated, but not delivered, before we grabbed the lock. Therefore:
2. Make sure that no deferred version of a replay event is already enqueued to the switchdev deferred queue, before adding it to the replay list, when replaying additions.
Fixes: 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries") Signed-off-by: Tobias Waldekranz <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3 |
|
| #
f2e2857b |
| 19-Jul-2023 |
Petr Machata <[email protected]> |
net: switchdev: Add a helper to replay objects on a bridge port
When a front panel joins a bridge via another netdevice (typically a LAG), the driver needs to learn about the objects configured on t
net: switchdev: Add a helper to replay objects on a bridge port
When a front panel joins a bridge via another netdevice (typically a LAG), the driver needs to learn about the objects configured on the bridge port. When the bridge port is offloaded by the driver for the first time, this can be achieved by passing a notifier to switchdev_bridge_port_offload(). The notifier is then invoked for the individual objects (such as VLANs) configured on the bridge, and can look for the interesting ones.
Calling switchdev_bridge_port_offload() when the second port joins the bridge lower is unnecessary, but the replay is still needed. To that end, add a new function, switchdev_bridge_port_replay(), which does only the replay part of the _offload() function in exactly the same way as that function.
Cc: Jiri Pirko <[email protected]> Cc: Ivan Vecera <[email protected]> Cc: Roopa Prabhu <[email protected]> Cc: Nikolay Aleksandrov <[email protected]> Cc: [email protected] Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
989280d6 |
| 19-Jul-2023 |
Petr Machata <[email protected]> |
net: bridge: br_switchdev: Tolerate -EOPNOTSUPP when replaying MDB
There are two kinds of MDB entries to be replayed: port MDB entries, and host MDB entries. They are both replayed by br_switchdev_m
net: bridge: br_switchdev: Tolerate -EOPNOTSUPP when replaying MDB
There are two kinds of MDB entries to be replayed: port MDB entries, and host MDB entries. They are both replayed by br_switchdev_mdb_replay(). If the driver supports one kind, but lacks the other, the first -EOPNOTSUPP returned terminates the whole replay, including any further still-supported objects in the list.
For this to cause issues, there must be MDB entries for both the host and the port being replayed. In that case, if the driver bails out from handling the host entry, the port entries are never replayed. However, the replay is currently only done when a switchdev port joins a bridge. There would be no port memberships at that point. Thus despite being erroneous, the code does not cause observable bugs.
This is not an issue with other object kinds either, because there, each function replays one object kind. If a driver does not support that kind, it makes sense to bail out early. -EOPNOTSUPP is then ignored in nbp_switchdev_sync_objs().
For MDB, suppress the -EOPNOTSUPP error code in br_switchdev_mdb_replay() already, so that the whole list gets replayed.
The reason we need this patch is that a future patch will introduce a replay that should be used when a front-panel port netdevice is enslaved to a bridge lower, in particular a LAG. The LAG netdevice can already have both host and port MDB entries. The port entries need to be replayed so that they are offloaded on the port that joins the LAG.
Cc: Jiri Pirko <[email protected]> Cc: Ivan Vecera <[email protected]> Cc: Roopa Prabhu <[email protected]> Cc: Nikolay Aleksandrov <[email protected]> Cc: [email protected] Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3 |
|
| #
927cdea5 |
| 18-Apr-2023 |
Vladimir Oltean <[email protected]> |
net: bridge: switchdev: don't notify FDB entries with "master dynamic"
There is a structural problem in switchdev, where the flag bits in struct switchdev_notifier_fdb_info (added_by_user, is_local
net: bridge: switchdev: don't notify FDB entries with "master dynamic"
There is a structural problem in switchdev, where the flag bits in struct switchdev_notifier_fdb_info (added_by_user, is_local etc) only represent a simplified / denatured view of what's in struct net_bridge_fdb_entry :: flags (BR_FDB_ADDED_BY_USER, BR_FDB_LOCAL etc). Each time we want to pass more information about struct net_bridge_fdb_entry :: flags to struct switchdev_notifier_fdb_info (here, BR_FDB_STATIC), we find that FDB entries were already notified to switchdev with no regard to this flag, and thus, switchdev drivers had no indication whether the notified entries were static or not.
For example, this command:
ip link add br0 type bridge && ip link set swp0 master br0 bridge fdb add dev swp0 00:01:02:03:04:05 master dynamic
has never worked as intended with switchdev. It causes a struct net_bridge_fdb_entry to be passed to br_switchdev_fdb_notify() which has a single flag set: BR_FDB_ADDED_BY_USER.
This is further passed to the switchdev notifier chain, where interested drivers have no choice but to assume this is a static (does not age) and sticky (does not migrate) FDB entry. So currently, all drivers offload it to hardware as such, as can be seen below ("offload" is set).
bridge fdb get 00:01:02:03:04:05 dev swp0 master 00:01:02:03:04:05 dev swp0 offload master br0
The software FDB entry expires $ageing_time centiseconds after the kernel last sees a packet with this MAC SA, and the bridge notifies its deletion as well, so it eventually disappears from hardware too.
This is a problem, because it is actually desirable to start offloading "master dynamic" FDB entries correctly - they should expire $ageing_time centiseconds after the *hardware* port last sees a packet with this MAC SA - and this is how the current incorrect behavior was discovered. With an offloaded data plane, it can be expected that software only sees exception path packets, so an otherwise active dynamic FDB entry would be aged out by software sooner than it should.
With the change in place, these FDB entries are no longer offloaded:
bridge fdb get 00:01:02:03:04:05 dev swp0 master 00:01:02:03:04:05 dev swp0 master br0
and this also constitutes a better way (assuming a backport to stable kernels) for user space to determine whether the kernel has the capability of doing something sane with these or not.
As opposed to "master dynamic" FDB entries, on the current behavior of which no one currently depends on (which can be deduced from the lack of kselftests), Ido Schimmel explains that entries with the "extern_learn" flag (BR_FDB_ADDED_BY_EXT_LEARN) should still be notified to switchdev, since the spectrum driver listens to them (and this is kind of okay, because although they are treated identically to "static", they are expected to not age, and to roam).
Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about FDB add/del") Link: https://lore.kernel.org/netdev/20230327115206.jk5q5l753aoelwus@skbuf/ Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Tested-by: Ido Schimmel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7 |
|
| #
028fb19c |
| 31-Jan-2023 |
Leon Romanovsky <[email protected]> |
netlink: provide an ability to set default extack message
In netdev common pattern, extack pointer is forwarded to the drivers to be filled with error message. However, the caller can easily overwri
netlink: provide an ability to set default extack message
In netdev common pattern, extack pointer is forwarded to the drivers to be filled with error message. However, the caller can easily overwrite the filled message.
Instead of adding multiple "if (!extack->_msg)" checks before any NL_SET_ERR_MSG() call, which appears after call to the driver, let's add new macro to common code.
[1] https://lore.kernel.org/all/Y9Irgrgf3uxOjwUm@unreal Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Link: https://lore.kernel.org/r/6993fac557a40a1973dfa0095107c3d03d40bec1.1675171790.git.leon@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5 |
|
| #
9c0ca02b |
| 08-Nov-2022 |
Ido Schimmel <[email protected]> |
bridge: switchdev: Reflect MAB bridge port flag to device drivers
Reflect the 'BR_PORT_MAB' flag to device drivers so that:
* Drivers that support MAB could act upon the flag being toggled. * Drive
bridge: switchdev: Reflect MAB bridge port flag to device drivers
Reflect the 'BR_PORT_MAB' flag to device drivers so that:
* Drivers that support MAB could act upon the flag being toggled. * Drivers that do not support MAB will prevent MAB from being enabled.
Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
27fabd02 |
| 08-Nov-2022 |
Hans J. Schultz <[email protected]> |
bridge: switchdev: Allow device drivers to install locked FDB entries
When the bridge is offloaded to hardware, FDB entries are learned and aged-out by the hardware. Some device drivers synchronize
bridge: switchdev: Allow device drivers to install locked FDB entries
When the bridge is offloaded to hardware, FDB entries are learned and aged-out by the hardware. Some device drivers synchronize the hardware and software FDBs by generating switchdev events towards the bridge.
When a port is locked, the hardware must not learn autonomously, as otherwise any host will blindly gain authorization. Instead, the hardware should generate events regarding hosts that are trying to gain authorization and their MAC addresses should be notified by the device driver as locked FDB entries towards the bridge driver.
Allow device drivers to notify the bridge driver about such entries by extending the 'switchdev_notifier_fdb_info' structure with the 'locked' bit. The bit can only be set by device drivers and not by the bridge driver.
Prevent a locked entry from being installed if MAB is not enabled on the bridge port.
If an entry already exists in the bridge driver, reject the locked entry if the current entry does not have the "locked" flag set or if it points to a different port. The same semantics are implemented in the software data path.
Signed-off-by: Hans J. Schultz <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4 |
|
| #
7f40ea21 |
| 21-Apr-2022 |
Clément Léger <[email protected]> |
net: bridge: switchdev: check br_vlan_group() return value
br_vlan_group() can return NULL and thus return value must be checked to avoid dereferencing a NULL pointer.
Fixes: 6284c723d9b9 ("net: br
net: bridge: switchdev: check br_vlan_group() return value
br_vlan_group() can return NULL and thus return value must be checked to avoid dereferencing a NULL pointer.
Fixes: 6284c723d9b9 ("net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations") Signed-off-by: Clément Léger <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc3, v5.18-rc2 |
|
| #
c3976a3f |
| 10-Apr-2022 |
Arınç ÜNAL <[email protected]> |
net: bridge: offload BR_HAIRPIN_MODE, BR_ISOLATED, BR_MULTICAST_TO_UNICAST
Add BR_HAIRPIN_MODE, BR_ISOLATED and BR_MULTICAST_TO_UNICAST port flags to BR_PORT_FLAGS_HW_OFFLOAD so that switchdev drive
net: bridge: offload BR_HAIRPIN_MODE, BR_ISOLATED, BR_MULTICAST_TO_UNICAST
Add BR_HAIRPIN_MODE, BR_ISOLATED and BR_MULTICAST_TO_UNICAST port flags to BR_PORT_FLAGS_HW_OFFLOAD so that switchdev drivers which have an offloaded data plane have a chance to reject these bridge port flags if they don't support them yet.
It makes the code path go through the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS driver handlers, which return -EINVAL for everything they don't recognize.
For drivers that don't catch SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS at all, switchdev will return -EOPNOTSUPP for those which is then ignored, but those are in the minority.
Signed-off-by: Arınç ÜNAL <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.18-rc1, v5.17 |
|
| #
6284c723 |
| 16-Mar-2022 |
Tobias Waldekranz <[email protected]> |
net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations
Whenever a VLAN moves to a new MSTI, send a switchdev notification so that switchdevs can track a bridge's VID to MSTI mappings.
S
net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations
Whenever a VLAN moves to a new MSTI, send a switchdev notification so that switchdevs can track a bridge's VID to MSTI mappings.
Signed-off-by: Tobias Waldekranz <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc8, v5.17-rc7, v5.17-rc6 |
|
| #
fa1c8334 |
| 23-Feb-2022 |
Hans Schultz <[email protected]> |
net: bridge: Add support for offloading of locked port flag
Various switchcores support setting ports in locked mode, so that clients behind locked ports cannot send traffic through the port unless
net: bridge: Add support for offloading of locked port flag
Various switchcores support setting ports in locked mode, so that clients behind locked ports cannot send traffic through the port unless a fdb entry is added with the clients MAC address.
Signed-off-by: Hans Schultz <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc5 |
|
| #
b28d580e |
| 15-Feb-2022 |
Vladimir Oltean <[email protected]> |
net: bridge: switchdev: replay all VLAN groups
The major user of replayed switchdev objects is DSA, and so far it hasn't needed information about anything other than bridge port VLANs, so this is al
net: bridge: switchdev: replay all VLAN groups
The major user of replayed switchdev objects is DSA, and so far it hasn't needed information about anything other than bridge port VLANs, so this is all that br_switchdev_vlan_replay() knows to handle.
DSA has managed to get by through replicating every VLAN addition on a user port such that the same VLAN is also added on all DSA and CPU ports, but there is a corner case where this does not work.
The mv88e6xxx DSA driver currently prints this error message as soon as the first port of a switch joins a bridge:
mv88e6085 0x0000000008b96000:00: port 0 failed to add a6:ef:77:c8:5f:3d vid 1 to fdb: -95
where a6:ef:77:c8:5f:3d vid 1 is a local FDB entry corresponding to the bridge MAC address in the default_pvid.
The -EOPNOTSUPP is returned by mv88e6xxx_port_db_load_purge() because it tries to map VID 1 to a FID (the ATU is indexed by FID not VID), but fails to do so. This is because ->port_fdb_add() is called before ->port_vlan_add() for VID 1.
The abridged timeline of the calls is:
br_add_if -> netdev_master_upper_dev_link -> dsa_port_bridge_join -> switchdev_bridge_port_offload -> br_switchdev_vlan_replay (*) -> br_switchdev_fdb_replay -> mv88e6xxx_port_fdb_add -> nbp_vlan_init -> nbp_vlan_add -> mv88e6xxx_port_vlan_add
and the issue is that at the time of (*), the bridge port isn't in VID 1 (nbp_vlan_init hasn't been called), therefore br_switchdev_vlan_replay() won't have anything to replay, therefore VID 1 won't be in the VTU by the time mv88e6xxx_port_fdb_add() is called.
This happens only when the first port of a switch joins. For further ports, the initial mv88e6xxx_port_vlan_add() is sufficient for VID 1 to be loaded in the VTU (which is switch-wide, not per port).
The problem is somewhat unique to mv88e6xxx by chance, because most other drivers offload an FDB entry by VID, so FDBs and VLANs can be added asynchronously with respect to each other, but addressing the issue at the bridge layer makes sense, since what mv88e6xxx requires isn't absurd.
To fix this problem, we need to recognize that it isn't the VLAN group of the port that we're interested in, but the VLAN group of the bridge itself (so it isn't a timing issue, but rather insufficient information being passed from switchdev to drivers).
As mentioned, currently nbp_switchdev_sync_objs() only calls br_switchdev_vlan_replay() for VLANs corresponding to the port, but the VLANs corresponding to the bridge itself, for local termination, also need to be replayed. In this case, VID 1 is not (yet) present in the port's VLAN group but is present in the bridge's VLAN group.
So to fix this bug, DSA is now obligated to explicitly handle VLANs pointing towards the bridge in order to "close this race" (which isn't really a race). As Tobias Waldekranz notices, this also implies that it must explicitly handle port VLANs on foreign interfaces, something that worked implicitly before: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/#24735260
So in the end, br_switchdev_vlan_replay() must replay all VLANs from all VLAN groups: all the ports, and the bridge itself.
Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
263029ae |
| 15-Feb-2022 |
Vladimir Oltean <[email protected]> |
net: bridge: make nbp_switchdev_unsync_objs() follow reverse order of sync()
There may be switchdev drivers that can add/remove a FDB or MDB entry only as long as the VLAN it's in has been notified
net: bridge: make nbp_switchdev_unsync_objs() follow reverse order of sync()
There may be switchdev drivers that can add/remove a FDB or MDB entry only as long as the VLAN it's in has been notified and offloaded first. The nbp_switchdev_sync_objs() method satisfies this requirement on addition, but nbp_switchdev_unsync_objs() first deletes VLANs, then deletes MDBs and FDBs. Reverse the order of the function calls to cater to this requirement.
Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
8d23a54f |
| 15-Feb-2022 |
Vladimir Oltean <[email protected]> |
net: bridge: switchdev: differentiate new VLANs from changed ones
br_switchdev_port_vlan_add() currently emits a SWITCHDEV_PORT_OBJ_ADD event with a SWITCHDEV_OBJ_ID_PORT_VLAN for 2 distinct cases:
net: bridge: switchdev: differentiate new VLANs from changed ones
br_switchdev_port_vlan_add() currently emits a SWITCHDEV_PORT_OBJ_ADD event with a SWITCHDEV_OBJ_ID_PORT_VLAN for 2 distinct cases:
- a struct net_bridge_vlan got created - an existing struct net_bridge_vlan was modified
This makes it impossible for switchdev drivers to properly balance PORT_OBJ_ADD with PORT_OBJ_DEL events, so if we want to allow that to happen, we must provide a way for drivers to distinguish between a VLAN with changed flags and a new one.
Annotate struct switchdev_obj_port_vlan with a "bool changed" that distinguishes the 2 cases above.
Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15 |
|
| #
326b212e |
| 27-Oct-2021 |
Vladimir Oltean <[email protected]> |
net: bridge: switchdev: consistent function naming
Rename all recently imported functions in br_switchdev.c to start with a br_switchdev_* prefix.
br_fdb_replay_one() -> br_switchdev_fdb_replay_one
net: bridge: switchdev: consistent function naming
Rename all recently imported functions in br_switchdev.c to start with a br_switchdev_* prefix.
br_fdb_replay_one() -> br_switchdev_fdb_replay_one() br_fdb_replay() -> br_switchdev_fdb_replay() br_vlan_replay_one() -> br_switchdev_vlan_replay_one() br_vlan_replay() -> br_switchdev_vlan_replay() struct br_mdb_complete_info -> struct br_switchdev_mdb_complete_info br_mdb_complete() -> br_switchdev_mdb_complete() br_mdb_switchdev_host_port() -> br_switchdev_host_mdb_one() br_mdb_switchdev_host() -> br_switchdev_host_mdb() br_mdb_replay_one() -> br_switchdev_mdb_replay_one() br_mdb_replay() -> br_switchdev_mdb_replay() br_mdb_queue_one() -> br_switchdev_mdb_queue_one()
Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
9776457c |
| 27-Oct-2021 |
Vladimir Oltean <[email protected]> |
net: bridge: mdb: move all switchdev logic to br_switchdev.c
The following functions:
br_mdb_complete br_switchdev_mdb_populate br_mdb_replay_one br_mdb_queue_one br_mdb_replay br_mdb_switchdev_hos
net: bridge: mdb: move all switchdev logic to br_switchdev.c
The following functions:
br_mdb_complete br_switchdev_mdb_populate br_mdb_replay_one br_mdb_queue_one br_mdb_replay br_mdb_switchdev_host_port br_mdb_switchdev_host br_switchdev_mdb_notify
are only accessible from code paths where CONFIG_NET_SWITCHDEV is enabled. So move them to br_switchdev.c, in order for that code to be compiled out if that config option is disabled.
Note that br_switchdev.c gets build regardless of whether CONFIG_BRIDGE_IGMP_SNOOPING is enabled or not, whereas br_mdb.c only got built when CONFIG_BRIDGE_IGMP_SNOOPING was enabled. So to preserve correct compilation with CONFIG_BRIDGE_IGMP_SNOOPING being disabled, we must now place an #ifdef around these functions in br_switchdev.c. The offending bridge data structures that need this are br->multicast_lock and br->mdb_list, these are also compiled out of struct net_bridge when CONFIG_BRIDGE_IGMP_SNOOPING is turned off.
Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
4a6849e4 |
| 27-Oct-2021 |
Vladimir Oltean <[email protected]> |
net: bridge: move br_vlan_replay to br_switchdev.c
br_vlan_replay() is relevant only if CONFIG_NET_SWITCHDEV is enabled, so move it to br_switchdev.c.
Signed-off-by: Vladimir Oltean <vladimir.oltea
net: bridge: move br_vlan_replay to br_switchdev.c
br_vlan_replay() is relevant only if CONFIG_NET_SWITCHDEV is enabled, so move it to br_switchdev.c.
Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
| #
fab9eca8 |
| 26-Oct-2021 |
Vladimir Oltean <[email protected]> |
net: bridge: create a common function for populating switchdev FDB entries
There are two places where a switchdev FDB entry is constructed, one is br_switchdev_fdb_notify() and the other is br_fdb_r
net: bridge: create a common function for populating switchdev FDB entries
There are two places where a switchdev FDB entry is constructed, one is br_switchdev_fdb_notify() and the other is br_fdb_replay(). One uses a struct initializer, and the other declares the structure as uninitialized and populates the elements one by one.
One problem when introducing new members of struct switchdev_notifier_fdb_info is that there is a risk for one of these functions to run with an uninitialized value.
So centralize the logic of populating such structure into a dedicated function. Being the primary location where these structures are created, using an uninitialized variable and populating the members one by one should be fine, since this one function is supposed to assign values to all its members.
Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
5cda5272 |
| 26-Oct-2021 |
Vladimir Oltean <[email protected]> |
net: bridge: move br_fdb_replay inside br_switchdev.c
br_fdb_replay is only called from switchdev code paths, so it makes sense to be disabled if switchdev is not enabled in the first place.
As opp
net: bridge: move br_fdb_replay inside br_switchdev.c
br_fdb_replay is only called from switchdev code paths, so it makes sense to be disabled if switchdev is not enabled in the first place.
As opposed to br_mdb_replay and br_vlan_replay which might be turned off depending on bridge support for multicast and VLANs, FDB support is always on. So moving br_mdb_replay and br_vlan_replay inside br_switchdev.c would mean adding some #ifdef's in br_switchdev.c, so we keep those where they are.
The reason for the movement is that in future changes there will be some code reuse between br_switchdev_fdb_notify and br_fdb_replay.
Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5 |
|
| #
957e2235 |
| 03-Aug-2021 |
Vladimir Oltean <[email protected]> |
net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge
With the introduction of explicit offloading API in switchdev in commit 2f5dc00f7a3e ("net: bridge: switchdev: let driver
net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge
With the introduction of explicit offloading API in switchdev in commit 2f5dc00f7a3e ("net: bridge: switchdev: let drivers inform which bridge ports are offloaded"), we started having Ethernet switch drivers calling directly into a function exported by net/bridge/br_switchdev.c, which is a function exported by the bridge driver.
This means that drivers that did not have an explicit dependency on the bridge before, like cpsw and am65-cpsw, now do - otherwise it is not possible to call a symbol exported by a driver that can be built as module unless you are a module too.
There was an attempt to solve the dependency issue in the form of commit b0e81817629a ("net: build all switchdev drivers as modules when the bridge is a module"). Grygorii Strashko, however, says about it:
| In my opinion, the problem is a bit bigger here than just fixing the | build :( | | In case, of ^cpsw the switchdev mode is kinda optional and in many | cases (especially for testing purposes, NFS) the multi-mac mode is | still preferable mode. | | There were no such tight dependency between switchdev drivers and | bridge core before and switchdev serviced as independent, notification | based layer between them, so ^cpsw still can be "Y" and bridge can be | "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE | will need to be set as "Y", or we will have to update drivers to | support build with BRIDGE=n and maintain separate builds for | networking vs non-networking testing. But is this enough? Wouldn't | it cause 'chain reaction' required to add more and more "Y" options | (like CONFIG_VLAN_8021Q)? | | PS. Just to be sure we on the same page - ARM builds will be forced | (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our | automation testing will just fail with omap2plus_defconfig.
In the light of this, it would be desirable for some configurations to avoid dependencies between switchdev drivers and the bridge, and have the switchdev mode as completely optional within the driver.
Arnd Bergmann also tried to write a patch which better expressed the build time dependency for Ethernet switch drivers where the switchdev support is optional, like cpsw/am65-cpsw, and this made the drivers follow the bridge (compile as module if the bridge is a module) only if the optional switchdev support in the driver was enabled in the first place: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
but this still did not solve the fact that cpsw and am65-cpsw now must be built as modules when the bridge is a module - it just expressed correctly that optional dependency. But the new behavior is an apparent regression from Grygorii's perspective.
So to support the use case where the Ethernet driver is built-in, NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we need a framework that can handle the possible absence of the bridge from the running system, i.e. runtime bloatware as opposed to build-time bloatware.
Luckily we already have this framework, since switchdev has been using it extensively. Events from the bridge side are transmitted to the driver side using notifier chains - this was originally done so that unrelated drivers could snoop for events emitted by the bridge towards ports that are implemented by other drivers (think of a switch driver with LAG offload that listens for switchdev events on a bonding/team interface that it offloads).
There are also events which are transmitted from the driver side to the bridge side, which again are modeled using notifiers. SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with notifying the bridge that a MAC address has been dynamically learned. So there is a precedent we can use for modeling the new framework.
The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work that the bridge needs to do when a port becomes offloaded is blocking in its nature: replay VLANs, MDBs etc. The calling context is indeed blocking (we are under rtnl_mutex), but the existing switchdev notification chain that the bridge is subscribed to is only the atomic one. So we need to subscribe the bridge to the blocking switchdev notification chain too.
This patch: - keeps the driver-side perception of the switchdev_bridge_port_{,un}offload unchanged - moves the implementation of switchdev_bridge_port_{,un}offload from the bridge module into the switchdev module. - makes everybody that is subscribed to the switchdev blocking notifier chain "hear" offload & unoffload events - makes the bridge driver subscribe and handle those events - moves the bridge driver's handling of those events into 2 new functions called br_switchdev_port_{,un}offload. These functions contain in fact the core of the logic that was previously in switchdev_bridge_port_{,un}offload, just that now we go through an extra indirection layer to reach them.
Unlike all the other switchdev notification structures, the structure used to carry the bridge port information, struct switchdev_notifier_brport_info, does not contain a "bool handled". This is because in the current usage pattern, we always know that a switchdev bridge port offloading event will be handled by the bridge, because the switchdev_bridge_port_offload() call was initiated by a NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event couldn't have happened.
Signed-off-by: Vladimir Oltean <[email protected]> Tested-by: Grygorii Strashko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
2e19bb35 |
| 02-Aug-2021 |
Vladimir Oltean <[email protected]> |
net: bridge: switchdev: fix incorrect use of FDB flags when picking the dst device
Nikolay points out that it is incorrect to assume that it is impossible to have an fdb entry with fdb->dst == NULL
net: bridge: switchdev: fix incorrect use of FDB flags when picking the dst device
Nikolay points out that it is incorrect to assume that it is impossible to have an fdb entry with fdb->dst == NULL and the BR_FDB_LOCAL bit in fdb->flags not set. This is because there are reader-side places that test_bit(BR_FDB_LOCAL, &fdb->flags) without the br->hash_lock, and if the updating of the FDB entry happens on another CPU, there are no memory barriers at writer or reader side which would ensure that the reader sees the updates to both fdb->flags and fdb->dst in the same order, i.e. the reader will not see an inconsistent FDB entry.
So we must be prepared to deal with FDB entries where fdb->dst and fdb->flags are in a potentially inconsistent state, and that means that fdb->dst == NULL should remain a condition to pick the net_device that we report to switchdev as being the bridge device, which is what the code did prior to the blamed patch.
Fixes: 52e4bec15546 ("net: bridge: switchdev: treat local FDBs the same as entries towards the bridge") Suggested-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
show more ...
|
|
Revision tags: v5.14-rc4 |
|
| #
52e4bec1 |
| 28-Jul-2021 |
Vladimir Oltean <[email protected]> |
net: bridge: switchdev: treat local FDBs the same as entries towards the bridge
Currently the following script:
1. ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up 2. ip link set
net: bridge: switchdev: treat local FDBs the same as entries towards the bridge
Currently the following script:
1. ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up 2. ip link set swp2 up && ip link set swp2 master br0 3. ip link set swp3 up && ip link set swp3 master br0 4. ip link set swp4 up && ip link set swp4 master br0 5. bridge vlan del dev swp2 vid 1 6. bridge vlan del dev swp3 vid 1 7. ip link set swp4 nomaster 8. ip link set swp3 nomaster
produces the following output:
[ 641.010738] sja1105 spi0.1: port 2 failed to delete 00:1f:7b:63:02:48 vid 1 from fdb: -2
[ swp2, swp3 and br0 all have the same MAC address, the one listed above ]
In short, this happens because the number of FDB entry additions notified to switchdev is unbalanced with the number of deletions.
At step 1, the bridge has a random MAC address. At step 2, the br_fdb_replay of swp2 receives this initial MAC address. Then the bridge inherits the MAC address of swp2 via br_fdb_change_mac_address(), and it notifies switchdev (only swp2 at this point) of the deletion of the random MAC address and the addition of 00:1f:7b:63:02:48 as a local FDB entry with fdb->dst == swp2, in VLANs 0 and the default_pvid (1).
During step 7:
del_nbp -> br_fdb_delete_by_port(br, p, vid=0, do_all=1); -> fdb_delete_local(br, p, f);
br_fdb_delete_by_port() deletes all entries towards the ports, regardless of vid, because do_all is 1.
fdb_delete_local() has logic to migrate local FDB entries deleted from one port to another port which shares the same MAC address and is in the same VLAN, or to the bridge device itself. This migration happens without notifying switchdev of the deletion on the old port and the addition on the new one, just fdb->dst is changed and the added_by_user flag is cleared.
In the example above, the del_nbp(swp4) causes the "addr 00:1f:7b:63:02:48 vid 1" local FDB entry with fdb->dst == swp4 that existed up until then to be migrated directly towards the bridge (fdb->dst == NULL). This is because it cannot be migrated to any of the other ports (swp2 and swp3 are not in VLAN 1).
After the migration to br0 takes place, swp4 requests a deletion replay of all FDB entries. Since the "addr 00:1f:7b:63:02:48 vid 1" entry now point towards the bridge, a deletion of it is replayed. There was just a prior addition of this address, so the switchdev driver deletes this entry.
Then, the del_nbp(swp3) at step 8 triggers another br_fdb_replay, and switchdev is notified again to delete "addr 00:1f:7b:63:02:48 vid 1". But it can't because it no longer has it, so it returns -ENOENT.
There are other possibilities to trigger this issue, but this is by far the simplest to explain.
To fix this, we must avoid the situation where the addition of an FDB entry is notified to switchdev as a local entry on a port, and the deletion is notified on the bridge itself.
Considering that the 2 types of FDB entries are completely equivalent and we cannot have the same MAC address as a local entry on 2 bridge ports, or on a bridge port and pointing towards the bridge at the same time, it makes sense to hide away from switchdev completely the fact that a local FDB entry is associated with a given bridge port at all. Just say that it points towards the bridge, it should make no difference whatsoever to the switchdev driver and should even lead to a simpler overall implementation, will less cases to handle.
This also avoids any modification at all to the core bridge driver, just what is reported to switchdev changes. With the local/permanent entries on bridge ports being already reported to user space, it is hard to believe that the bridge behavior can change in any backwards-incompatible way such as making all local FDB entries point towards the bridge.
Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
| #
b4454bc6 |
| 28-Jul-2021 |
Vladimir Oltean <[email protected]> |
net: bridge: switchdev: replay the entire FDB for each port
Currently when a switchdev port joins a bridge, we replay all FDB entries pointing towards that port or towards the bridge.
However, this
net: bridge: switchdev: replay the entire FDB for each port
Currently when a switchdev port joins a bridge, we replay all FDB entries pointing towards that port or towards the bridge.
However, this is insufficient in certain situations:
(a) DSA, through its assisted_learning_on_cpu_port logic, snoops dynamically learned FDB entries on foreign interfaces. These are FDB entries that are pointing neither towards the newly joined switchdev port, nor towards the bridge. So these addresses would be missed when joining a bridge where a foreign interface has already learned some addresses, and they would also linger on if the DSA port leaves the bridge before the foreign interface forgets them. None of this happens if we replay the entire FDB when the port joins.
(b) There is a desire to treat local FDB entries on a port (i.e. the port's termination MAC address) identically to FDB entries pointing towards the bridge itself. More details on the reason behind this in the next patch. The point is that this cannot be done given the current structure of br_fdb_replay() in this situation: ip link set swp0 master br0 # br0 inherits its MAC address from swp0 ip link set swp1 master br0 What is desirable is that when swp1 joins the bridge, br_fdb_replay() also notifies swp1 of br0's MAC address, but this won't in fact happen because the MAC address of br0 does not have fdb->dst == NULL (it doesn't point towards the bridge), but it has fdb->dst == swp0. So our current logic makes it impossible for that address to be replayed. But if we dump the entire FDB instead of just the entries with fdb->dst == swp1 and fdb->dst == NULL, then the inherited MAC address of br0 will be replayed too, which is what we need.
A natural question arises: say there is an FDB entry to be replayed, like a MAC address dynamically learned on a foreign interface that belongs to a bridge where no switchdev port has joined yet. If 10 switchdev ports belonging to the same driver join this bridge, one by one, won't every port get notified 10 times of the foreign FDB entry, amounting to a total of 100 notifications for this FDB entry in the switchdev driver?
Well, yes, but this is where the "void *ctx" argument for br_fdb_replay is useful: every port of the switchdev driver is notified whenever any other port requests an FDB replay, but because the replay was initiated by a different port, its context is different from the initiating port's context, so it ignores those replays.
So the foreign FDB entry will be installed only 10 times, once per port. This is done so that the following 4 code paths are always well balanced: (a) addition of foreign FDB entry is replayed when port joins bridge (b) deletion of foreign FDB entry is replayed when port leaves bridge (c) addition of foreign FDB entry is notified to all ports currently in bridge (c) deletion of foreign FDB entry is notified to all ports currently in bridge
Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v5.14-rc3 |
|
| #
c5381154 |
| 23-Jul-2021 |
Vladimir Oltean <[email protected]> |
net: bridge: fix build when setting skb->offload_fwd_mark with CONFIG_NET_SWITCHDEV=n
Switchdev support can be disabled at compile time, and in that case, struct sk_buff will not contain the offload
net: bridge: fix build when setting skb->offload_fwd_mark with CONFIG_NET_SWITCHDEV=n
Switchdev support can be disabled at compile time, and in that case, struct sk_buff will not contain the offload_fwd_mark field.
To make the code in br_forward.c work in both cases, we do what is done in other places and we create a helper function, with an empty shim definition, that is implemented by the br_switchdev.o translation module. This is always compiled if and only if CONFIG_NET_SWITCHDEV is y or m.
Reported-by: kernel test robot <[email protected]> Fixes: 472111920f1c ("net: bridge: switchdev: allow the TX data plane forwarding to be offloaded") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|