Skip to content

perf(super-converter): replace xml-js with direct sax tree builder for DOCX import (SD-2291)#2514

Draft
caio-pizzol wants to merge 1 commit intomainfrom
caio/sd-2291-replace-xml-js-with-native-domparser-for-import
Draft

perf(super-converter): replace xml-js with direct sax tree builder for DOCX import (SD-2291)#2514
caio-pizzol wants to merge 1 commit intomainfrom
caio/sd-2291-replace-xml-js-with-native-domparser-for-import

Conversation

@caio-pizzol
Copy link
Contributor

Bypass xml-js's overhead (options validation, generic dispatch, JSON stringify+parse round-trip) by using sax.js directly with a purpose-built tree builder that produces the same non-compact JSON format.

Benchmarks on customer's 11MB document.xml:

  • Old (JSON.parse(xml2json())): ~1,980ms (browser)
  • New (direct sax builder): ~810ms (browser)
  • ~2x faster across all document sizes

The new parser produces identical output — verified by 13 unit tests including head-to-head comparison against xml-js on realistic DOCX fragments.

@caio-pizzol caio-pizzol self-assigned this Mar 22, 2026
@linear
Copy link

linear bot commented Mar 22, 2026

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9caeb608c8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…r DOCX import

Bypass xml-js's overhead (options validation, generic dispatch, JSON
stringify+parse round-trip) by using sax.js directly with a purpose-built
tree builder that produces the same non-compact JSON format.

Benchmarks on customer's 11MB document.xml:
- Old (JSON.parse(xml2json())): ~1,980ms (browser)
- New (direct sax builder):     ~810ms  (browser)
- ~2x faster across all document sizes

The new parser produces identical output — verified by 13 unit tests
including head-to-head comparison against xml-js on realistic DOCX
fragments.

SD-2291
@caio-pizzol caio-pizzol force-pushed the caio/sd-2291-replace-xml-js-with-native-domparser-for-import branch from 9caeb60 to b6f254a Compare March 22, 2026 12:10
@caio-pizzol caio-pizzol marked this pull request as draft March 23, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant