Performance: 47% faster parse+render, 60% fewer allocations#2056
Open
Performance: 47% faster parse+render, 60% fewer allocations#2056
Conversation
… for string literals
… single conditions
…ter chains without full Lexer pass for name
… filter args without colon
…tespace string allocs
…e when no limits active
… when args present
…identifiers without Lexer/Parser\n\nResult: {"status":"keep","combined_µs":4427,"parse_µs":3181,"render_µs":1246,"allocations":27235}
…tespace, no filters)\n\nResult: {"status":"keep","combined_µs":4277,"parse_µs":3057,"render_µs":1220,"allocations":27026}
…ilters (e.g. pluralize: 'item', 'items')\n\nResult: {"status":"keep","combined_µs":4266,"parse_µs":3032,"render_µs":1234,"allocations":26480}
…ray + string allocations\n\nResult: {"status":"keep","combined_µs":4280,"parse_µs":3009,"render_µs":1271,"allocations":26395}
… per render cycle\n\nResult: {"status":"keep","combined_µs":4158,"parse_µs":2920,"render_µs":1238,"allocations":26128}
…on until needed\n\nResult: {"status":"keep","combined_µs":4299,"parse_µs":3057,"render_µs":1242,"allocations":26015}
…nterpolation\n\nResult: {"status":"keep","combined_µs":4372,"parse_µs":3127,"render_µs":1245,"allocations":25605}
…tually written\n\nResult: {"status":"keep","combined_µs":4287,"parse_µs":3059,"render_µs":1228,"allocations":25595}
…array allocs per render cycle\n\nResult: {"status":"keep","combined_µs":4262,"parse_µs":3079,"render_µs":1183,"allocations":25535}
…oids method lookup overhead\n\nResult: {"status":"keep","combined_µs":4207,"parse_µs":2943,"render_µs":1264,"allocations":25535}
… environments\n\nResult: {"status":"keep","combined_µs":4323,"parse_µs":3055,"render_µs":1268,"allocations":25535}
… respond_to?(:context=)\n\nResult: {"status":"keep","combined_µs":4225,"parse_µs":3009,"render_µs":1216,"allocations":25535}
…Result: {"status":"keep","combined_µs":4334,"parse_µs":3062,"render_µs":1272,"allocations":25535}
…checks for Hash objects\n\nResult: {"status":"keep","combined_µs":4110,"parse_µs":2922,"render_µs":1188,"allocations":25535}
…Scanner.scan is faster than Ruby-level byte scanning\n\nResult: {"status":"keep","combined_µs":4185,"parse_µs":2943,"render_µs":1242,"allocations":25535}
… performance\n\nResult: {"status":"keep","combined_µs":4184,"parse_µs":2931,"render_µs":1253,"allocations":25535}
…h regex — cleaner, same/better perf\n\nResult: {"status":"keep","combined_µs":4132,"parse_µs":2890,"render_µs":1242,"allocations":25535}
…eslice allocation for op strings\n\nResult: {"status":"keep","combined_µs":4007,"parse_µs":2808,"render_µs":1199,"allocations":25535}
…status":"keep","combined_µs":4047,"parse_µs":2795,"render_µs":1252,"allocations":25535}
…: {"status":"keep","combined_µs":4102,"parse_µs":2849,"render_µs":1253,"allocations":25535}
…"combined_µs":4121,"parse_µs":2812,"render_µs":1309,"allocations":25535}
…nResult: {"status":"keep","combined_µs":4184,"parse_µs":2921,"render_µs":1263,"allocations":25535}
…ds respond_to? dispatch\n\nResult: {"status":"keep","combined_µs":4131,"parse_µs":2893,"render_µs":1238,"allocations":25535}
…t: {"status":"keep","combined_µs":4196,"parse_µs":3042,"render_µs":1154,"allocations":25535}
…across templates\n\nResult: {"status":"keep","combined_µs":4147,"parse_µs":2992,"render_µs":1155,"allocations":24881}
…ds respond_to? dispatch\n\nResult: {"status":"keep","combined_µs":4103,"parse_µs":2881,"render_µs":1222,"allocations":24881}
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
47% faster combined parse+render time, 60% fewer object allocations on the ThemeRunner benchmark (real Shopify theme templates with production-like data). Zero test regressions — all 974 unit tests pass.
Measured with YJIT enabled on Ruby 3.4, using
performance/bench_quick.rb(best of 3 runs, 10 iterations each with GC disabled, after 20-iteration warmup).Methodology
This PR was developed through 85 automated experiments using an autoresearch loop: edit → commit → run tests → benchmark → keep/discard. Each change was validated against the full unit test suite before benchmarking. Changes that regressed either correctness or the primary metric were reverted immediately.
The approach was allocation-driven: profile where objects are created, eliminate the ones that aren't needed, and defer the ones that are. Ruby's GC scanning time dominates at these scales — every avoided allocation compound-saves in GC pressure.
Architecture: the Cursor class
The headline architectural change is
Liquid::Cursor— aStringScannerwrapper with higher-level methods tuned for Liquid's grammar. One Cursor instance lives on eachParseContextand is reused across all tag/variable/expression parsing within a template.Key design:
scan_*methods return strings (allocate),skip_*/expect_*methods return lengths or booleans (zero-alloc). Methods delegate to C-levelStringScanner.scan/skipwith compiled regexes — benchmarking showed this is 2-3x faster than Ruby-levelpeek_byte/scan_byteloops.This replaces ~150 scattered
getbyte/byteslicecalls across BlockBody, Variable, If, For with a shared vocabulary. It's also the foundation for eventual single-pass parsing — the Cursor can be advanced forward through an entire template source without intermediate token arrays.What changed (by impact)
Parse optimizations (~53% faster, ~38K fewer allocs)
Replace regex with byte-level parsing, then regex-delegate via Cursor. The original code used
=~regex matching withRegexp.last_matchcaptures for tag tokens, variable lookups,fortag syntax,ifconditions, and number literals. Each=~call creates aMatchDataobject. Replaced with forward-only scanning via Cursor, which uses C-levelStringScanner.scan/skipwith compiled regexes — no MatchData, no Ruby-level byte loops:BlockBody.parse_tag_token:FullTokenregex → Cursorscan_tag_name+ position mathVariableLookup.scan_variable:VariableParserregex → manual byte scannerFor#lax_parse:Syntaxregex → Cursorskip_id/expect_id/scan_fragmentIf#lax_parse:SIMPLE_CONDITIONregex → Cursorparse_simple_conditionExpression.parse_number:INTEGER_REGEX/FLOAT_REGEX→ Cursorscan_numberVariable.simple_variable_markup:getbytechain replaces regex for identifier detectionFast-path Variable initialization. 100% of variables in the benchmark (1,197) now parse through
try_fast_parse— a byte-level scanner that extracts the name expression and filter chain without touching the Lexer or Parser. Zero Lexer/Parser fallbacks — even multi-argument filters likepluralize: 'item', 'items'are scanned directly with comma-separated arg handling. Only keyword arguments (key: value) would fall through (none appear in the benchmark templates).Cached no-arg filter tuples. The
[filtername, EMPTY_ARRAY]tuple for no-argument filters (75% of all filter calls) is now frozen and cached per filter name viaNO_ARG_FILTER_CACHE. Saves ~650 array allocations.Fast-path VariableLookup. Simple identifier chains (
product.title,forloop.index) skipscan_variableentirely. Asimple_lookup?byte check validates the pattern, thenbyteslice+ dot-splitting creates the lookups array directly. For single-name variables (product),@lookups = Const::EMPTY_ARRAY— zero-alloc.Avoid unnecessary string allocations.
Expression.parseskipsstripwhen no leading/trailing whitespace. Variable fast-path reuses the markup string directly when no trimming is needed (avoids byteslice).blank_string?usesmatch?regex instead of byte loop.Render optimizations (~22% faster, ~3K fewer allocs)
Splat-free filter invocation. Filters without arguments (
| escape,| strip_html— 75% of all filter calls) now useinvoke_single(method, input)which avoids the*argsarray allocation. Single-arg filters useinvoke_two. Only 59 calls per render still need the splat path.Primitive type fast paths.
find_variablereturns immediately for String, Integer, Float, Array, Hash, nil, true, false — skippingto_liquid(which returns self for all of these) andrespond_to?(:context=)checks. Same optimization inVariableLookup#evaluatefor hash key lookups and result handling.to_liquid_valueskipped for String/Integer keys.Hash fast-path in VariableLookup.
instance_of?(Hash)check before the generalrespond_to?(:[])/respond_to?(:key?)chain — hashes are the most common lookup target.Context#find_variableoptimizations. Top-scope fast path (most common in for loops). Single-scope shortcut — when only one scope exists, skipfind_indexand go straight to environments.Cached small integer
to_s.Utils.to_sreturns pre-computed frozen strings for integers 0-999, avoiding 267Integer#to_sallocations per render cycle.Lazy initialization.
ContextdefersStringScannerand@interruptsarray creation until actually needed.Registersdefers@changeshash.static_environmentsusesEMPTY_ARRAYwhen empty.block_delimiterstrings cached per tag name.Utils.to_s/Utils.inspectlazyseenhash. Theseen = {}default parameter allocated a hash on every call even though the recursive-structure guard is almost never triggered. Changed toseen = nilwithseen || {}only when entering Hash/Array branches.Utils.slice_collectionfast path. Whenfrom == 0,to.nil?, and collection is already an Array, returns it directly instead of copying throughslice_collection_using_each.Code removed / simplified
The Cursor consolidation deleted ~75 lines of duplicated byte-scanning logic. Methods that previously had 20+ lines of manual
getbyte/scan_byteloops are now 1-3 line regex delegations. Examples:What did NOT work (reverted experiments)
expressionmethod mutates token strings in-place viastr << variable_lookups. Cached tokens get corrupted. Would need frozen tokens + dup-on-mutate, which adds more allocs than it saves.parse_variable_token. Saves downstream byteslice allocs but changes error message content (markup_context uses the trimmed string).String#split— creates more allocs from per-wordbyteslicethansplitdoes internally.case/whentype dispatch inContext#evaluate. YJIT already optimizesrespond_to?well — the case/when adds overhead from type checking.Benchmark reproduction
The benchmark uses
ThemeRunnerwhich parses/renders 4 real Shopify themes (dropify, ripen, tribble, vogue) with production-like database fixtures. YJIT is enabled. GC is disabled during measurement windows. Times areProcess.clock_gettime(CLOCK_MONOTONIC)wall-clock, allocations viaObjectSpace.count_objects.Files changed
lib/liquid/cursor.rb— new Cursor class (StringScanner wrapper with regex-based Liquid-specific methods)lib/liquid/block_body.rb— tag/variable token parsing via Cursor, regexblank_string?lib/liquid/variable.rb—try_fast_parsebyte-level name+filter scanner with multi-arg support, cached no-arg filter tuples,invoke_single/invoke_tworender dispatchlib/liquid/variable_lookup.rb—simple_lookup?byte validator,parse_simplefast path, primitive type fast paths inevaluatelib/liquid/expression.rb— byte-levelparse_number, conditionalstrip,byteslicefor string literalslib/liquid/context.rb—invoke_single/invoke_two,find_variableprimitive fast paths + single-scope shortcut, lazy init, frozen defaultslib/liquid/strainer_template.rb—invoke_single/invoke_twodispatch methodslib/liquid/tags/if.rb— Cursor-based simple condition parsinglib/liquid/tags/for.rb— Cursor-basedlax_parsewith zero-allocskip_id/expect_idlib/liquid/block.rb— cachedblock_delimiterstringslib/liquid/registers.rb— lazy@changeshashlib/liquid/standardfilters.rb— allocation-optimizedtruncatewordslib/liquid/lexer.rb—\s+instead of\s*for whitespace skiplib/liquid/utils.rb— cached small integerto_s, lazyseenhash,slice_collectionArray fast pathlib/liquid/parse_context.rb— Cursor instance,attr_readerfor expression_cache/string_scannerlib/liquid/resource_limits.rb— exposelast_capture_lengthfor render loop optimization