UCT/IB/UD: Fix skb use-after-free on simultaneous CREQ#11246
UCT/IB/UD: Fix skb use-after-free on simultaneous CREQ#11246sunnytiwari-e4 wants to merge 1 commit intoopenucx:masterfrom
Conversation
91bf609 to
69e0bf6
Compare
There was a problem hiding this comment.
any possibility to send ctl/creq as signaled and release its skb when receiving tx cqe (dealing only with releasing memory), and keep processing it as ack? apparently there are other possibly other cases involving an ack like:
#10423 (comment)
src/uct/ib/ud/base/ud_ep.c
Outdated
| * We restrict this to incomplete connections to prevent side-effects | ||
| * on established endpoints. | ||
| */ | ||
| if (!uct_ud_ep_is_connected(ep)) { |
There was a problem hiding this comment.
seems we still mark ep as connected when receiving simultaneous creq at line 856 or after having received crep, so I wonder if this can be executed at all, when restricted to connected ep?
There was a problem hiding this comment.
Thanks @tvegas1 for reviewing it!
any possibility to send creq as signaled and release its skb when receiving tx cqe, and keep processing it as ack?
I am thinking about the potential tx cqe polling overhead if we make every creq signaled, also we might quickly exhaust SQ/CQ slots during massive connection bursts. Second is UD needs the skb in the tx window for timeout/resends. If we free the skb right after the local tx cqe and the packet gets dropped on the wire, we lose the buffer before getting a real peer ack. We'd have to dynamically rebuild the creq payload from scratch on every timeout in that case.
seems we still mark ep as connected when receiving simultaneous creq at line 856 or after having received crep, so I wonder if this can be executed at all, when restricted to connected ep?
you are right, very first creq marks the ep as connected so it'll never get executed. This check was an "optimization" I added in the latest revision, but as you noted (and as the failing CI confirms), it's incorrect.
Does it make sense to unconditionally schedule ack here when dropping duplicate creq or crep as any duplicate creq or crep arriving at this point has a matching conn_sn which definitively means the peer is stuck waiting because our previously sent crep or ack was lost on the wire.
There was a problem hiding this comment.
i think the case mentioned in the comment where send1_a, retransmit send1_b, and ack_a from send1_a releases send1_b while it is still being sent, could be also valid. maybe we would only free skb on tx cqe poll, not adding new signaled cqe but using tx cqe poll from existing signaled and covering previous ranges? tx window would be cleaned independently, but without actually releasing the skbs?
…nection When two endpoints execute a simultaneous connection establishment, both endpoints transmit a CREQ packet. Previously, the UD transport handled this collision by incorrectly treating the incoming CREQ as an implicit ACK for the outgoing CREQ, immediately releasing the outgoing skb back to the memory pool. This triggers a use-after-free bug when the CREQ payload exceeds `max_inline`, as the skb is freed while the NIC is still performing async DMA read from that memory, causing silent data corruption. Fix this by removing the implicit ACK logic on simultaneous CREQs. The endpoints now correctly rely on the subsequent CREP packets to acknowledge their initial connection requests. Additionally, handle the resulting deadlock when a CREP is dropped. If an endpoint receives a duplicate CREQ while still establishing a connection, it now schedules an explicit ACK_REQ before discarding it. This allows the stuck peer to free its unacknowledged CREQ.
Fix premature skb release and deadlock on simultaneous connection
What?
Fixes #6040 #10423
ACK_REQscheduling when discarding a duplicateCREQ.ACK_REQto incomplete connections (!uct_ud_ep_is_connected(ep)). This handles the dropped CREP deadlock without risking ACK spam from stale endpoints on established connections.Why?
When two UD endpoints perform a simultaneous connection establishment, both transmit a CREQ packet. Historically, the UD transport handled this collision by treating the incoming CREQ as an implicit ACK for the outgoing CREQ, immediately releasing the outgoing skb back to the memory pool.
Issue 1: Memory corruption
When the CREQ payload exceeds
max_inline, the implicit ACK logic is unsafe. The skb is freed immediately while the NIC is still performing the async DMA read from that memory, causing silent data corruption.Issue 2: Deadlock and OOM
Removing the implicit ACK exposed a deadlock if one side's CREP reply is dropped by the network. The peer will indefinitely resend its CREQ. The receiver drops these resent CREQs as duplicates but never sends an ACK. The sender stays stuck waiting for an ACK, stalling its transmit window, and piling up Active Messages until it exhausts the memory pool.
How?