[Option] Parallelize preconditioners across ranks; multi-node FSDP #100

luciaquirke · 2025-12-19T06:33:58Z

More VRAM efficient variant where preconditioners can be spread across an arbitrary number of nodes to compute large outer products. This is useful because preconditioners are often applied to a query and then the query is run across a large dataset, so slow but VRAM-efficient preconditioner computation and usage is a scalable pattern.

Because the preconditioners don't necessarily fit on a single GPU we use GLOO to do distributed CPU operations.

LouisYRYJ · 2025-12-27T13:40:02Z

bergson/collector/multi_node_gradient_collector.py

+
+
+@dataclass(kw_only=True)
+class MultiNodeGradientCollector(HookCollectorBase):


is this going to be a replacement for GradientCollector? It seems like we don't it, if we have this one

Yes, I will merge this as a separate class for dogfooding and then replace the GradientCollector when we're convinced it's stable

LouisYRYJ · 2025-12-27T13:41:53Z

bergson/build.py


 def build_worker(
    rank: int,
+    local_rank: int,


add to doc what this does

luciaquirke changed the title ~~[Option] Parallelize preconditioners across ranks #94~~ [Option] Parallelize preconditioners across ranks Dec 19, 2025

luciaquirke force-pushed the multi-node branch from 8066d72 to fa7f1b3 Compare December 19, 2025 06:35

luciaquirke requested a review from LouisYRYJ December 21, 2025 00:44

luciaquirke force-pushed the multi-node branch from 9277080 to a9d1531 Compare December 21, 2025 00:50

luciaquirke added 2 commits December 21, 2025 01:11

save

e8fb7d8

Enable FSDP across nodes with START_RANK

4061982

luciaquirke force-pushed the multi-node branch from a9d1531 to 4061982 Compare December 21, 2025 01:12

luciaquirke changed the title ~~[Option] Parallelize preconditioners across ranks~~ [Option] Parallelize preconditioners across ranks; multi-node FSDP Dec 21, 2025

LouisYRYJ reviewed Dec 27, 2025

View reviewed changes

bergson/build.py

def build_worker(

rank: int,

local_rank: int,

Copy link

Contributor

LouisYRYJ Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to doc what this does

luciaquirke reacted with eyes emoji

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Option] Parallelize preconditioners across ranks; multi-node FSDP #100

[Option] Parallelize preconditioners across ranks; multi-node FSDP #100

Uh oh!

luciaquirke commented Dec 19, 2025

Uh oh!

LouisYRYJ Dec 27, 2025

Uh oh!

luciaquirke Jan 6, 2026

Uh oh!

LouisYRYJ Dec 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@dataclass(kw_only=True)
		class MultiNodeGradientCollector(HookCollectorBase):

[Option] Parallelize preconditioners across ranks; multi-node FSDP #100

Are you sure you want to change the base?

[Option] Parallelize preconditioners across ranks; multi-node FSDP #100

Uh oh!

Conversation

luciaquirke commented Dec 19, 2025

Uh oh!

LouisYRYJ Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

luciaquirke Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

LouisYRYJ Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants