midcomms.c - OpenGrok history log for /linux-6.15/fs/dlm/midcomms.c

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2
# f49da8c0	28-May-2024	Alexander Aring <[email protected]>	dlm: remove unused parameter in dlm_midcomms_addr This patch removes an parameter which is currently not used by dlm_midcomms_addr(). Signed-off-by: Alexander Aring <[email protected]> Signed-off dlm: remove unused parameter in dlm_midcomms_addr This patch removes an parameter which is currently not used by dlm_midcomms_addr(). Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3
# 578acf9a	02-Apr-2024	Alexander Aring <[email protected]>	dlm: use spin_lock_bh for message processing Use spin_lock_bh for all spinlocks involved in message processing, in preparation for softirq message processing. DLM lock requests from user space invo dlm: use spin_lock_bh for message processing Use spin_lock_bh for all spinlocks involved in message processing, in preparation for softirq message processing. DLM lock requests from user space involve dlm processing in user context, in addition to the standard kernel context, necessitating bh variants. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# 98808644	02-Apr-2024	Alexander Aring <[email protected]>	dlm: remove allocation parameter in msg allocation Remove the context parameter for message allocations and always use GFP_ATOMIC. This prepares for softirq message processing. Signed-off-by: Alexa dlm: remove allocation parameter in msg allocation Remove the context parameter for message allocations and always use GFP_ATOMIC. This prepares for softirq message processing. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
Revision tags: v6.9-rc2
# 609ed5bd	28-Mar-2024	Kunwu Chan <[email protected]>	dlm: Simplify the allocation of slab caches in dlm_midcomms_cache_create Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Signed-off-by: K dlm: Simplify the allocation of slab caches in dlm_midcomms_cache_create Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Signed-off-by: Kunwu Chan <[email protected]> Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
Revision tags: v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6
# 6212e452	10-Oct-2023	Alexander Aring <[email protected]>	dlm: fix no ack after final message In case of an final DLM message we can't should not send an ack out after the final message. This patch moves the ack message before the messages will be transmit dlm: fix no ack after final message In case of an final DLM message we can't should not send an ack out after the final message. This patch moves the ack message before the messages will be transmitted. If it's the final message and the receiving node turns into DLM_CLOSED state another ack messages will being received and turning the receiving node into DLM_ESTABLISHED again. Fixes: 1696c75f1864 ("fs: dlm: add send ack threshold and append acks to msgs") Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# e759eb3e	10-Oct-2023	Alexander Aring <[email protected]>	dlm: be sure we reset all nodes at forced shutdown In case we running in a force shutdown in either midcomms or lowcomms implementation we will make sure we reset all per midcomms node information. dlm: be sure we reset all nodes at forced shutdown In case we running in a force shutdown in either midcomms or lowcomms implementation we will make sure we reset all per midcomms node information. Fixes: 63e711b08160 ("fs: dlm: create midcomms nodes when configure") Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# 2776635e	10-Oct-2023	Alexander Aring <[email protected]>	dlm: fix remove member after close call The idea of commit 63e711b08160 ("fs: dlm: create midcomms nodes when configure") is to set the midcomms node lifetime when a node joins or leaves the cluster dlm: fix remove member after close call The idea of commit 63e711b08160 ("fs: dlm: create midcomms nodes when configure") is to set the midcomms node lifetime when a node joins or leaves the cluster. Currently we can hit the following warning: [10844.611495] ------------[ cut here ]------------ [10844.615913] WARNING: CPU: 4 PID: 84304 at fs/dlm/midcomms.c:1263 dlm_midcomms_remove_member+0x13f/0x180 [dlm] or running in a state where we hit a midcomms node usage count in a negative value: [ 260.830782] node 2 users dec count -1 The first warning happens when the a specific node does not exists and it was probably removed but dlm_midcomms_close() which is called when a node leaves the cluster. The second kernel log message is probably in a case when dlm_midcomms_addr() is called when a joined the cluster but due fencing a node leaved the cluster without getting removed from the lockspace. If the node joins the cluster and it was removed from the cluster due fencing the first call is to remove the node from lockspaces triggered by the user space. In both cases if the node wasn't found or the user count is zero, we should ignore any additional midcomms handling of dlm_midcomms_remove_member(). Fixes: 63e711b08160 ("fs: dlm: create midcomms nodes when configure") Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# fe9b619e	10-Oct-2023	Alexander Aring <[email protected]>	dlm: fix creating multiple node structures This patch will lookup existing nodes instead of always creating them when dlm_midcomms_addr() is called. The idea is here to create midcomms nodes when us dlm: fix creating multiple node structures This patch will lookup existing nodes instead of always creating them when dlm_midcomms_addr() is called. The idea is here to create midcomms nodes when user space getting informed that nodes joins the cluster. This is the case when dlm_midcomms_addr() is called, however it can be called multiple times by user space to add several address configurations to one node e.g. when using SCTP. Those multiple times need to be filtered out and we doing that by looking up if the node exists before. Due configfs entry it is safe that this function gets only called once at a time. Fixes: 63e711b08160 ("fs: dlm: create midcomms nodes when configure") Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
Revision tags: v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5
# 63e711b0	01-Aug-2023	Alexander Aring <[email protected]>	fs: dlm: create midcomms nodes when configure This patch puts the life of a midcomms node the same as a lowcomms connection. The lowcomms connection lifetime was changed by commit 6f0b0b5d7ae7 ("fs: fs: dlm: create midcomms nodes when configure This patch puts the life of a midcomms node the same as a lowcomms connection. The lowcomms connection lifetime was changed by commit 6f0b0b5d7ae7 ("fs: dlm: remove dlm_node_addrs lookup list"). In the future the midcomms node instances can be merged with lowcomms connection structure as the lifetime is the same and states can be controlled over values or flags. Before midcomms nodes were generated during version detection. This is not necessary anymore when the nodes are created when the cluster manager configures DLM via configfs. When a midcomms node is created over configfs it well set DLM_VERSION_NOT_SET as version. This indicates that the version of the midcomms node is still unknown and need to be probed via certain rcom messages. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# 11519351	01-Aug-2023	Alexander Aring <[email protected]>	fs: dlm: constify receive buffer The dlm receive buffer should be never manipulated as DLM is the last instance of parsing layer. This patch constify the whole receive buffer so we are sure it never fs: dlm: constify receive buffer The dlm receive buffer should be never manipulated as DLM is the last instance of parsing layer. This patch constify the whole receive buffer so we are sure it never gets manipulated when it's being parsed. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# 643f5cfa	01-Aug-2023	Alexander Aring <[email protected]>	fs: dlm: cleanup lock order This patch cleanups the lock order to hold at first the close_lock and then held the nodes_srcu read lock. Probably it will never be a problem as nodes_srcu is only a rea fs: dlm: cleanup lock order This patch cleanups the lock order to hold at first the close_lock and then held the nodes_srcu read lock. Probably it will never be a problem as nodes_srcu is only a read lock preventing the node pointer getting freed. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
Revision tags: v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5
# 1696c75f	29-May-2023	Alexander Aring <[email protected]>	fs: dlm: add send ack threshold and append acks to msgs This patch changes the time when we sending an ack back to tell the other side it can free some message because it is arrived on the receiver fs: dlm: add send ack threshold and append acks to msgs This patch changes the time when we sending an ack back to tell the other side it can free some message because it is arrived on the receiver node, due random reconnects e.g. TCP resets this is handled as well on application layer to not let DLM run into a deadlock state. The current handling has the following problems: 1. We end in situations that we only send an ack back message of 16 bytes out and no other messages. Whereas DLM has logic to combine so much messages as it can in one send() socket call. This behaviour can be discovered by "trace-cmd start -e dlm_recv" and observing the ret field being 16 bytes. 2. When processing of DLM messages will never end because we receive a lot of messages, we will not send an ack back as it happens when the processing loop ends. This patch introduces a likely and unlikely threshold case. The likely case will send an ack back on a transmit path if the threshold is triggered of amount of processed upper layer protocol. This will solve issue 1 because it will be send when another normal DLM message will be sent. It solves issue 2 because it is not part of the processing loop. There is however a unlikely case, the unlikely case has a bigger threshold and will be triggered when we only receive messages and do not sent any message back. This case avoids that the sending node will keep a lot of message for a long time as we send sometimes ack backs to tell the sender to finally release messages. The atomic cmpxchg() is there to provide a atomically ack send with reset of the upper layer protocol delivery counter. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# d00725ca	29-May-2023	Alexander Aring <[email protected]>	fs: dlm: handle sequence numbers as atomic Currently seq_next is only be read on the receive side which processed in an ordered way. The seq_send is being protected by locks. To being able to read t fs: dlm: handle sequence numbers as atomic Currently seq_next is only be read on the receive side which processed in an ordered way. The seq_send is being protected by locks. To being able to read the seq_next value on send side as well we convert it to an atomic_t value. The atomic_cmpxchg() is probably not necessary, however the atomic_inc() depends on a if coniditional and this should be handled in an atomic context. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# 07ee3867	29-May-2023	Alexander Aring <[email protected]>	fs: dlm: filter ourself midcomms calls It makes no sense to call midcomms/lowcomms functionality for the local node as socket functionality is only required for remote nodes. This patch filters thos fs: dlm: filter ourself midcomms calls It makes no sense to call midcomms/lowcomms functionality for the local node as socket functionality is only required for remote nodes. This patch filters those calls in the upper layer of lockspace membership handling instead of doing it in midcomms/lowcomms layer as they should never be aware of local nodeid. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# c6b6d6dc	29-May-2023	Alexander Aring <[email protected]>	fs: dlm: revert check required context while close This patch reverts commit 2c3fa6ae4d52 ("dlm: check required context while close"). The function dlm_midcomms_close(), which will call later dlm_lo fs: dlm: revert check required context while close This patch reverts commit 2c3fa6ae4d52 ("dlm: check required context while close"). The function dlm_midcomms_close(), which will call later dlm_lowcomms_close(), is called when the cluster manager tells the node got fenced which means on midcomms/lowcomms layer to disconnect the node from the cluster communication. The node can rejoin the cluster later. This patch was ensuring no new message were able to be triggered when we are in the close() function context. This was done by checking if the lockspace has been stopped. However there is a missing check that we only need to check specific lockspaces where the fenced node is member of. This is currently complicated because there is no way to easily check if a node is part of a specific lockspace without stopping the recovery. For now we just revert this commit as it is just a check to finding possible leaks of stopping lockspaces before close() is called. Cc: [email protected] Fixes: 2c3fa6ae4d52 ("dlm: check required context while close") Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
Revision tags: v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4
# 723b197b	12-Jan-2023	Alexander Aring <[email protected]>	fs: dlm: remove unnecessary waker_up() calls The wake_up() is already handled inside of midcomms_node_reset() when switching the state to CLOSED state. So there is not need to call it after midcomms fs: dlm: remove unnecessary waker_up() calls The wake_up() is already handled inside of midcomms_node_reset() when switching the state to CLOSED state. So there is not need to call it after midcomms_node_reset() again. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# ef7ef015	12-Jan-2023	Alexander Aring <[email protected]>	fs: dlm: move state change into else branch Currently we can switch at first into DLM_CLOSE_WAIT state and then do another state change if a condition is true. Instead of doing two state changes we fs: dlm: move state change into else branch Currently we can switch at first into DLM_CLOSE_WAIT state and then do another state change if a condition is true. Instead of doing two state changes we handle the other state change inside an else branch of this condition. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# 31864097	12-Jan-2023	Alexander Aring <[email protected]>	fs: dlm: remove newline in log_print There is an API difference between log_print() and other printk()s to put a newline or not. This one was introduced by mistake because log_print() adds a newline fs: dlm: remove newline in log_print There is an API difference between log_print() and other printk()s to put a newline or not. This one was introduced by mistake because log_print() adds a newline. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# 11605353	12-Jan-2023	Alexander Aring <[email protected]>	fs: dlm: reduce the shutdown timeout to 5 secs When a shutdown is stuck, time out after 5 seconds instead of 3 minutes. After this timeout we try a forced shutdown. Signed-off-by: Alexander Aring fs: dlm: reduce the shutdown timeout to 5 secs When a shutdown is stuck, time out after 5 seconds instead of 3 minutes. After this timeout we try a forced shutdown. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# b8b750e0	12-Jan-2023	Alexander Aring <[email protected]>	fs: dlm: wait until all midcomms nodes detect version The current dlm version detection is very complex due to backwards compatablilty with earlier dlm protocol versions. It takes some time to detec fs: dlm: wait until all midcomms nodes detect version The current dlm version detection is very complex due to backwards compatablilty with earlier dlm protocol versions. It takes some time to detect if a peer node has a specific DLM version. If it's not detected, we just cut the socket connection. There could be cases where the local node has not detected the version yet, but the peer node has. In these cases, we are trying to shutdown the dlm connection with a FIN/ACK message exchange to be sure the other peer is ready to shutdown the connection on dlm application level. However this mechanism is only available on DLM protocol version 3.2 and we need to be sure the DLM version is detected before. To make it more robust we introduce a a "best effort" wait to wait for the version detection before shutdown the dlm connection. This need to be done before the kthread recoverd for recovery handling is stopped, because recovery handling will trigger enough messages to have a version detection going on. It is a corner case which was detected by modprobe dlm_locktroture module and rmmod dlm_locktorture module directly afterwards (in a looping behaviour). In practice probably nobody would leave a lockspace immediately after joining it. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# 89835b06	12-Jan-2023	Alexander Aring <[email protected]>	fs: dlm: ignore unexpected non dlm opts msgs This patch ignores unexpected RCOM_NAMES/RCOM_STATUS messages. To be backwards compatible, those messages are not part of the new reliable DLM OPTS encap fs: dlm: ignore unexpected non dlm opts msgs This patch ignores unexpected RCOM_NAMES/RCOM_STATUS messages. To be backwards compatible, those messages are not part of the new reliable DLM OPTS encapsulation header, and have their own retransmit handling using sequence number matching When we get unexpected non dlm opts messages, we should allow them and let RCOM message handling filter them out using sequence numbers. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# 54fbe0c1	12-Jan-2023	Alexander Aring <[email protected]>	fs: dlm: bring back previous shutdown handling This patch mostly reverts commit 4f567acb0b86 ("fs: dlm: remove socket shutdown handling"). There can be situations where the dlm midcomms nodes hash a fs: dlm: bring back previous shutdown handling This patch mostly reverts commit 4f567acb0b86 ("fs: dlm: remove socket shutdown handling"). There can be situations where the dlm midcomms nodes hash and lowcomms connection hash are not equal, but we need to guarantee that the lowcomms are all closed on a last release of a dlm lockspace, when a shutdown is invoked. This patch guarantees that we always close all sockets managed by the lowcomms connection hash, and calls shutdown for the last message sent. This ensures we don't cut the socket, which could cause the peer to get a connection reset. In future we should try to merge the midcomms/lowcomms hashes into one hash and not handle both in separate hashes. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# 00908b33	12-Jan-2023	Alexander Aring <[email protected]>	fs: dlm: send FIN ack back in right cases This patch moves to send a ack back for receiving a FIN message only when we are in valid states. In other cases and there might be a sender waiting for a a fs: dlm: send FIN ack back in right cases This patch moves to send a ack back for receiving a FIN message only when we are in valid states. In other cases and there might be a sender waiting for a ack we just let it timeout at the senders time and hopefully all other cleanups will remove the FIN message on their sending queue. As an example we should never send out an ACK being in LAST_ACK state or we cannot assume a working socket communication when we are in CLOSED state. Cc: [email protected] Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect") Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# a5849636	12-Jan-2023	Alexander Aring <[email protected]>	fs: dlm: move sending fin message into state change handling This patch moves the send fin handling, which should appear in a specific state change, into the state change handling while the per node fs: dlm: move sending fin message into state change handling This patch moves the send fin handling, which should appear in a specific state change, into the state change handling while the per node state_lock is held. I experienced issues with other messages because we changed the state and a fin message was sent out in a different state. Cc: [email protected] Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect") Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
# 15c63db8	12-Jan-2023	Alexander Aring <[email protected]>	fs: dlm: don't set stop rx flag after node reset Similar to the stop tx flag, the rx flag should warn about a dlm message being received at DLM_FIN state change, when we are assuming no other dlm ap fs: dlm: don't set stop rx flag after node reset Similar to the stop tx flag, the rx flag should warn about a dlm message being received at DLM_FIN state change, when we are assuming no other dlm application messages. If we receive a FIN message and we are in the state DLM_FIN_WAIT2 we call midcomms_node_reset() which puts the midcomms node into DLM_CLOSED state. Afterwards we should not set the DLM_NODE_FLAG_STOP_RX flag any more. This patch changes the setting DLM_NODE_FLAG_STOP_RX in those state changes when we receive a FIN message and we assume there will be no other dlm application messages received until we hit DLM_CLOSED state. Cc: [email protected] Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect") Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]> show more ...
12 3