Skip to content

Conversation

@RagnarGrootKoerkamp
Copy link
Contributor

@RagnarGrootKoerkamp RagnarGrootKoerkamp commented Oct 8, 2025

Fix #15:

  • Instead of re-initialising a Box<[u32x8; 8]> on each call, we use a thread-local buffer.
  • Instead of read_slice_32 that has a bounds-check, we propagate the padding of the PackedSeqVec via as_slice, so that we can just always do unchecked reads. Big speedups from this for short seqs.
  • Tried gather instead of transpose, but it's only marginally better. (See gather_instead_of_transpose branch.)

@RagnarGrootKoerkamp RagnarGrootKoerkamp merged commit 8c8b3a8 into master Oct 8, 2025
2 checks passed
@RagnarGrootKoerkamp RagnarGrootKoerkamp deleted the faster-par-iter-bp branch October 8, 2025 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

par_iter_bp for short inputs

3 participants