Run comms on the R thread by lionel- · Pull Request #1075 · posit-dev/ark

lionel- · 2026-03-02T15:01:29Z

Progress towards #689
Progress towards #691
Progress towards #1074

Comm handlers (data explorer, variables, connections...) currently run on their own spawned threads and call R through r_task(), which hooks into R's polled events. In practice this makes R evaluations preemptive: a comm handler may force a promise that triggers loadNamespace() while loadNamespace() is already on the call stack. As we move more logic to comm RPCs, the surface area for these reentrancy bugs grows. The data explorer is a good example: it sorts, filters, and profiles columns by calling back into R, all from a background thread.

A second, more subtle problem is message ordering. Because comm handlers run on their own threads, IOPub events (like data-changed notifications or comm_close) race with Idle. Tests had to work around this with complicated buffering and polling in DummyArkFrontend.

This PR introduces a blocking comm path and migrates the data explorer as the first comm to demonstrate the wins:

No more R reentrancy risk from r_task() at interrupt time.
No more risk of omitting r_task() wrappers around R-related code, which allows a much nicer development experience (and easier code reviews).
Much simpler implementation (the dedicated thread and select! loop goes away).
10 data explorers used to spawn 10 comm threads. We now spawn 0 additional threads. In addition to reduced complexity, this reduces memory usage since each thread allocates 2mb for its stack. This reduction will also apply to plot comms, so could be significant in real sessions.
Deterministic update ordering (environment change side effects land within the Busy/Idle window of the request that caused them). The tests become deterministic and all the event buffering test infra goes away.

Blocking Shell while comms run on the R thread

The fix is to make comm message handling block Shell. When a comm_msg arrives on the Shell socket, instead of forwarding it to a channel and immediately moving on to the next request, we send the message to the R thread and wait for it to finish. This way no execute_request (or other comm RPC) can start while R code is running for a comm.

Amalthea side

Two new default methods on ShellHandler: handle_comm_msg() and handle_comm_close(). They return CommHandled::Handled or CommHandled::NotHandled. The latter falls back to the existing incoming_tx path, so non-migrated comms keep working unchanged. This is the only part that touches the framework; the rest is in ark.
In Shell::handle_comm_msg() / handle_comm_close(), the new handler is tried first. If it returns NotHandled, we fall through to the old channel send.

Ark side

The CommHandler trait. This is the new contract for comms that want to run on the R thread:

handle_open(): called once on the R thread when the comm is registered.
handle_msg(): incoming RPC or data message.
handle_close(): frontend-initiated close.
handle_environment(): notification after each top-level execution (or debug frame selection), so handlers can react to binding changes.
open_metadata(): metadata for the comm_open IOPub message (backend-initiated comms).

A CommHandlerContext gives each handler access to any relevant Console state or methods, to its outgoing_tx for emitting messages to the frontend, and to a close_on_exit() mechanism for self-closing (e.g. when a watched binding disappears).

Console as the comm registry:

Console owns a HashMap<String, ConsoleComm> keyed by comm ID. The comm_register() method handles backend-initiated opens (creates CommSocket, calls handle_open, sends CommEvent::Opened). Frontend-initiated opens go through KernelRequest::CommOpen with a factory closure so the handler is constructed on the R thread (important because it may hold RObjects).

Ordering change in Console::handle_execute_result(): The old code emitted environment_changed before returning to the event loop, which meant the event ran at an unpredictable time via r_task(). Now the sequence is explicit:

Send execute result/error on IOPub
Call comm_notify_environment_changed() synchronously
Then send the execute reply (which unblocks Shell, which sends Idle)

This guarantees all comm side-effects (data explorer updates, closes) land on IOPub within the Busy/Idle window of the execute request that caused them. This is what makes test assertions deterministic.

Data explorer migration

Data explorer is the first comm fully migrated to the new path. The diff is large but the transformation is mechanical:

No more spawned thread: RDataExplorer::start() → RDataExplorer::new(). Construction returns the struct directly instead of spawning a thread with a select! loop.
No more r_task() calls: All the r_task(|| self.r_foo()) wrappers become direct self.foo() calls. The r_ prefix indicating a method is to be called on the R thread is dropped.
No more RThreadSafe: Since everything runs on the R thread, Table becomes a plain newtype over RObject (with a manual unsafe impl Send for the idle-task path). The Arc<Mutex<Option<RThreadSafe<RObject>>>> is gone.

Because data explorer messages now arrive deterministically on IOPub, the entire DataExplorerBuffer infrastructure in DummyArkFrontend is removed (~180 lines of buffering, polling, and try_buffer_msg logic). The integration tests now robustly assert comm events sequentially.

jmcphers

I didn't review this in detail, but the structure is sound. I do think it is useful to have some async comms for cases wherein we don't actually need to talk to R at all and don't need to synchronize ourselves with the busy/idle groupings. But this pattern feels much better for things like the Data Explorer that are primarily interacting with R state.

lionel- · 2026-03-03T08:30:09Z

I do think it is useful to have some async comms for cases wherein we don't actually need to talk to R at all and don't need to synchronize ourselves with the busy/idle groupings.

Absolutely! I've kept a Shell thread on the Ark side as an intermediate between Amalthea Shell and the Ark Console for that reason. The Ark Shell thread will dispatch asynchronous messages to async comm threads. See also related discussion in posit-dev/positron#7447.

This setup will resemble how the DAP currently works, with a Console side running on the R thread and a server side living in its own thread. Both sides share common state via a mutex, and the server side is also able to run R code via idle tasks.

crates/amalthea/src/language/shell_handler.rs

crates/amalthea/src/socket/shell.rs

DavisVaughan · 2026-03-05T16:48:58Z

crates/ark/src/console.rs

    /// Channel used to send along messages relayed on the open comms.
-    comm_event_tx: Sender<CommEvent>,
+    pub(crate) comm_event_tx: Sender<CommEvent>,


I thought get_comm_event_tx() that doesn't expose this as pub was a nice abstraction :/

You'd still need a pub crate if you call it from other files

Yes but you can't modify it from other files because you only get a reference, that's the benefit of the getter and the reason for all of the get_ methods in here

That makes sense, I've restored the method returning a reference, and as discussed on Slack used the submodule trick to remove pub(crate) for fields accessed from Console methods implemented in other files.

DavisVaughan · 2026-03-05T16:59:43Z

crates/ark/src/data_explorer/table.rs

-    }
+// Safety: `Table` is only accessed on the R thread (or in R idle tasks,
+// which also run on the R thread).
+unsafe impl Send for Table {}


I am very uneasy about this

We still ship a Table around via an r_task::spawn_idle() (as you mentioned).

I know that we:

Send the task from the main R thread

Pick the task up and run it on the main R thread

But I am still extremely nervous about providing anything outside of RThreadSafe that can send across threads. I just don't trust us to get it right every time.

I think I would prefer to keep this wrapped in RThreadSafe, because to me that is The Way to ship across threads, even if you end up running from the main R thread on both sides.

I would also like to note that this problem would very likely go away entirely if we had a variant of r_task::spawn_idle() that did not require a Send bound on T.

I believe we are stuck with that as long as we are using crossbeam channels, but everything happening here is all on one thread! The main R thread!

All we really want is to queue up a task within the same thread so that read_console can run it at the next idle iteration. Something like r_task::enque_idle() maybe, just spitballing.

That shouldn't require a crossbeam channel ideally (although compatibility with a crossbeam select! would make it challenging probably). We maybe just need some VecDeque to push and pop from.

Then you should be able to ship a closure around, even if it has an RObject inside, without a Send bound.

But until then, I still like RThreadSafe

yep all task spawning should eventually be done from the R thread, and then we use a simple queue data structure instead of the crossbeam channel.

I'm comfortable with the current setup but since it makes you uneasy I'll restore the RThreadSafe.

Just a heads up that this forced me to add some IS_TESTING weirdness to RThreadSafe's drop method

DavisVaughan · 2026-03-05T17:15:38Z

crates/ark/src/data_explorer/r_data_explorer.rs

+    fn update(&mut self, ctx: &CommHandlerContext) -> anyhow::Result<bool> {
        // No need to check for updates if we have no binding


Random thought. Would it be nice to have some kind of assert_r_thread!() macro we could put at the top of functions like this? Panic in debug mode and no-op in release mode? It would be self documenting and would help us with our invariants.

Not a bad idea but it feels superfluous to me, now that the architecture ensures that comms consistently run on the R thread.

DavisVaughan · 2026-03-05T17:39:03Z

crates/ark/src/request.rs

+    /// Register a new comm handler on the R thread (frontend-initiated comms).
+    /// Uses a factory closure so the handler (which may hold `RObject`s) is
+    /// created on the R thread rather than sent across threads.
+    CommOpen {
+        comm_id: String,
+        comm_name: String,
+        factory: CommHandlerFactory,
+        ctx: CommHandlerContext,
+        done_tx: Sender<()>,


This is what didn't look used to me, and the factory stuff just felt confusing if we don't have a use for it...

Even if you think you'll use it for Variables, I'd be interested in delaying the addition of this to that PR so we can see / justify that we really do need this weird factory thing

ok let's see in the Variables PR how that comes along

crates/ark/src/shell.rs

crates/ark/tests/data_explorer_debug.rs

DavisVaughan · 2026-03-05T17:49:24Z

crates/ark/src/console/console_comm.rs

+            if reg.ctx.is_closed() {
+                closed_ids.push(comm_id.clone());
+            }
+        }
+        for comm_id in closed_ids {
+            if let Some(reg) = self.comms.remove(&comm_id) {
+                self.comm_notify_closed(&comm_id, &reg);
+            }


This whole is_closed() thing feels a bit wrong to me.

It feels like after any kind of generic message we have to check if the backend decided to close the comm? Like we do this here and in comm_handle_msg.

Should something else be handling this in a more consistent manner?

Otherwise it feels like if we add any other comm_notify_*() helper to this, then we are going to also need to check is_closed there too, and that feels so easy to forget

I had the same feeling as you but couldn't find a better way to structure this.

DavisVaughan · 2026-03-05T17:50:37Z

crates/ark/src/comm_handler.rs

+pub enum EnvironmentChanged {
+    /// A top-level execution completed (user code, debug eval, etc.).
+    Execution,
+    /// The user selected a different frame in the call stack during debugging.
+    FrameSelected,
+}


i like this framing!

lionel- requested review from DavisVaughan and jmcphers March 2, 2026 15:53

jmcphers approved these changes Mar 2, 2026

View reviewed changes

DavisVaughan approved these changes Mar 5, 2026

View reviewed changes

lionel- force-pushed the task/sync-comms branch 2 times, most recently from b7a45f2 to ae67c85 Compare March 6, 2026 21:10

lionel- added 23 commits March 8, 2026 09:06

Add CommHandled and ShellHandler trait extension to Amalthea

f0892a1

Add blocking comm infrastructure to Ark

e25ecfa

Migrate RDataExplorer to the blocking path

e0d597e

Remove usage of RThreadSafe in data explorer comm

5a87043

Derive Debug as much as possible

54489cf

Rename get_comm_event_tx to comm_event_tx

7dd78f0

Remove r_ prefix now that everything runs on the R thread

7bbfb53

Rename to close_on_exit()

aba84b8

Improve comments

5e8c40e

Make events deterministic before Console replies

fb4043a

Make deterministic assertions in data explorer tests

7229ac2

Move comm event channel to comm handler context

918ed76

Rename to ConsoleComm

93f0339

Add open_metadata() method to CommHandler trait

060f535

Pass reason to handle_environment

228ea74

Simplify table storage for single thread

33dc2a6

Make CommHandler non-Send

9a50bf7

Make ConsoleComm fields pub(crate)

bab5d9b

Avoid some unwraps

cd5a228

Propagate errors to Amalthea

539dffd

Remove method in favour of field

3077d0d

Extract dispatch_kernel_request()

ce2473a

Address code review

b6c7194

lionel- added 8 commits March 8, 2026 09:06

Make comms field private

0aa09ec

Reintroduce channel accessor

c79c381

Simplify execute result handling

919c673

Export comm name constant

17aa1d3

Address code review

b758942

Extract drain_closed()

29eb195

Remove CommOpen for now

b81a60d

Restore RThreadSafe

d7fc62b

lionel- force-pushed the task/sync-comms branch from e67dd19 to d7fc62b Compare March 8, 2026 11:00

lionel- merged commit ef09c81 into main Mar 8, 2026
8 checks passed

lionel- deleted the task/sync-comms branch March 8, 2026 12:16

github-actions bot locked and limited conversation to collaborators Mar 8, 2026

		fn update(&mut self, ctx: &CommHandlerContext) -> anyhow::Result<bool> {
		// No need to check for updates if we have no binding

Conversation

lionel- commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Blocking Shell while comms run on the R thread

Amalthea side

Ark side

Data explorer migration

Uh oh!

jmcphers left a comment

Choose a reason for hiding this comment

Uh oh!

lionel- commented Mar 3, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DavisVaughan Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lionel- commented Mar 2, 2026 •

edited

Loading

DavisVaughan Mar 5, 2026 •

edited

Loading