Skip to content

UCT/IB/UD: Fix skb use-after-free on simultaneous CREQ#11246

Draft
sunnytiwari-e4 wants to merge 1 commit intoopenucx:masterfrom
sunnytiwari-e4:master
Draft

UCT/IB/UD: Fix skb use-after-free on simultaneous CREQ#11246
sunnytiwari-e4 wants to merge 1 commit intoopenucx:masterfrom
sunnytiwari-e4:master

Conversation

@sunnytiwari-e4
Copy link

@sunnytiwari-e4 sunnytiwari-e4 commented Mar 10, 2026

Fix premature skb release and deadlock on simultaneous connection

What?

Fixes #6040 #10423

  • Removed the unsafe implicit ACK logic on simultaneous CREQs. The endpoints now safely rely on subsequent CREP packets to acknowledge initial connection requests.
  • Added explicit ACK_REQ scheduling when discarding a duplicate CREQ.
  • Restricted this explicit ACK_REQ to incomplete connections (!uct_ud_ep_is_connected(ep)). This handles the dropped CREP deadlock without risking ACK spam from stale endpoints on established connections.

Why?

When two UD endpoints perform a simultaneous connection establishment, both transmit a CREQ packet. Historically, the UD transport handled this collision by treating the incoming CREQ as an implicit ACK for the outgoing CREQ, immediately releasing the outgoing skb back to the memory pool.
Issue 1: Memory corruption
When the CREQ payload exceeds max_inline, the implicit ACK logic is unsafe. The skb is freed immediately while the NIC is still performing the async DMA read from that memory, causing silent data corruption.
Issue 2: Deadlock and OOM
Removing the implicit ACK exposed a deadlock if one side's CREP reply is dropped by the network. The peer will indefinitely resend its CREQ. The receiver drops these resent CREQs as duplicates but never sends an ACK. The sender stays stuck waiting for an ACK, stalling its transmit window, and piling up Active Messages until it exhausts the memory pool.

How?

@sunnytiwari-e4 sunnytiwari-e4 force-pushed the master branch 5 times, most recently from 91bf609 to 69e0bf6 Compare March 11, 2026 09:16
Copy link
Contributor

@tvegas1 tvegas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any possibility to send ctl/creq as signaled and release its skb when receiving tx cqe (dealing only with releasing memory), and keep processing it as ack? apparently there are other possibly other cases involving an ack like:
#10423 (comment)

* We restrict this to incomplete connections to prevent side-effects
* on established endpoints.
*/
if (!uct_ud_ep_is_connected(ep)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we still mark ep as connected when receiving simultaneous creq at line 856 or after having received crep, so I wonder if this can be executed at all, when restricted to connected ep?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tvegas1 for reviewing it!

any possibility to send creq as signaled and release its skb when receiving tx cqe, and keep processing it as ack?

I am thinking about the potential tx cqe polling overhead if we make every creq signaled, also we might quickly exhaust SQ/CQ slots during massive connection bursts. Second is UD needs the skb in the tx window for timeout/resends. If we free the skb right after the local tx cqe and the packet gets dropped on the wire, we lose the buffer before getting a real peer ack. We'd have to dynamically rebuild the creq payload from scratch on every timeout in that case.

seems we still mark ep as connected when receiving simultaneous creq at line 856 or after having received crep, so I wonder if this can be executed at all, when restricted to connected ep?

you are right, very first creq marks the ep as connected so it'll never get executed. This check was an "optimization" I added in the latest revision, but as you noted (and as the failing CI confirms), it's incorrect.
Does it make sense to unconditionally schedule ack here when dropping duplicate creq or crep as any duplicate creq or crep arriving at this point has a matching conn_sn which definitively means the peer is stuck waiting because our previously sent crep or ack was lost on the wire.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the case mentioned in the comment where send1_a, retransmit send1_b, and ack_a from send1_a releases send1_b while it is still being sent, could be also valid. maybe we would only free skb on tx cqe poll, not adding new signaled cqe but using tx cqe poll from existing signaled and covering previous ranges? tx window would be cleaned independently, but without actually releasing the skbs?

…nection

When two endpoints execute a simultaneous connection establishment,
both endpoints transmit a CREQ packet. Previously, the UD transport
handled this collision by incorrectly treating the incoming CREQ as an
implicit ACK for the outgoing CREQ, immediately releasing the outgoing
skb back to the memory pool.

This triggers a use-after-free bug when the CREQ payload exceeds
`max_inline`, as the skb is freed while the NIC is still performing
async DMA read from that memory, causing silent data corruption.

Fix this by removing the implicit ACK logic on simultaneous CREQs. The
endpoints now correctly rely on the subsequent CREP packets to
acknowledge their initial connection requests.

Additionally, handle the resulting deadlock when a CREP is dropped.
If an endpoint receives a duplicate CREQ while still establishing a
connection, it now schedules an explicit ACK_REQ before discarding it.
This allows the stuck peer to free its unacknowledged CREQ.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem treating incoming creq as ack

2 participants