Performance: 47% faster parse+render, 60% fewer allocations by tobi · Pull Request #2056 · Shopify/liquid

tobi · 2026-03-11T13:47:48Z

Summary

47% faster combined parse+render time, 60% fewer object allocations on the ThemeRunner benchmark (real Shopify theme templates with production-like data). Zero test regressions — all 974 unit tests pass.

Metric	Main	This PR	Change
Combined (parse+render)	7,488µs	3,967µs	-47%
Parse time	5,928µs	2,803µs	-53%
Render time	1,481µs	1,161µs	-22%
Object allocations	62,620	24,881	-60%

Measured with YJIT enabled on Ruby 3.4, using performance/bench_quick.rb (best of 3 runs, 10 iterations each with GC disabled, after 20-iteration warmup).

Methodology

This PR was developed through 85 automated experiments using an autoresearch loop: edit → commit → run tests → benchmark → keep/discard. Each change was validated against the full unit test suite before benchmarking. Changes that regressed either correctness or the primary metric were reverted immediately.

The approach was allocation-driven: profile where objects are created, eliminate the ones that aren't needed, and defer the ones that are. Ruby's GC scanning time dominates at these scales — every avoided allocation compound-saves in GC pressure.

Architecture: the Cursor class

The headline architectural change is Liquid::Cursor — a StringScanner wrapper with higher-level methods tuned for Liquid's grammar. One Cursor instance lives on each ParseContext and is reused across all tag/variable/expression parsing within a template.

cursor = parse_context.cursor
cursor.reset(markup)
cursor.skip_ws
tag_name = cursor.scan_tag_name   # C-level regex scan
cursor.expect_id("in")            # zero-alloc: regex skip + byte compare
cursor.skip_fragment              # zero-alloc: regex skip

Key design: scan_* methods return strings (allocate), skip_* / expect_* methods return lengths or booleans (zero-alloc). Methods delegate to C-level StringScanner.scan/skip with compiled regexes — benchmarking showed this is 2-3x faster than Ruby-level peek_byte/scan_byte loops.

This replaces ~150 scattered getbyte/byteslice calls across BlockBody, Variable, If, For with a shared vocabulary. It's also the foundation for eventual single-pass parsing — the Cursor can be advanced forward through an entire template source without intermediate token arrays.

What changed (by impact)

Parse optimizations (~53% faster, ~38K fewer allocs)

Replace regex with byte-level parsing, then regex-delegate via Cursor. The original code used =~ regex matching with Regexp.last_match captures for tag tokens, variable lookups, for tag syntax, if conditions, and number literals. Each =~ call creates a MatchData object. Replaced with forward-only scanning via Cursor, which uses C-level StringScanner.scan/skip with compiled regexes — no MatchData, no Ruby-level byte loops:

BlockBody.parse_tag_token: FullToken regex → Cursor scan_tag_name + position math
VariableLookup.scan_variable: VariableParser regex → manual byte scanner
For#lax_parse: Syntax regex → Cursor skip_id/expect_id/scan_fragment
If#lax_parse: SIMPLE_CONDITION regex → Cursor parse_simple_condition
Expression.parse_number: INTEGER_REGEX/FLOAT_REGEX → Cursor scan_number
Variable.simple_variable_markup: getbyte chain replaces regex for identifier detection

Fast-path Variable initialization. 100% of variables in the benchmark (1,197) now parse through try_fast_parse — a byte-level scanner that extracts the name expression and filter chain without touching the Lexer or Parser. Zero Lexer/Parser fallbacks — even multi-argument filters like pluralize: 'item', 'items' are scanned directly with comma-separated arg handling. Only keyword arguments (key: value) would fall through (none appear in the benchmark templates).

Cached no-arg filter tuples. The [filtername, EMPTY_ARRAY] tuple for no-argument filters (75% of all filter calls) is now frozen and cached per filter name via NO_ARG_FILTER_CACHE. Saves ~650 array allocations.

Fast-path VariableLookup. Simple identifier chains (product.title, forloop.index) skip scan_variable entirely. A simple_lookup? byte check validates the pattern, then byteslice + dot-splitting creates the lookups array directly. For single-name variables (product), @lookups = Const::EMPTY_ARRAY — zero-alloc.

Avoid unnecessary string allocations. Expression.parse skips strip when no leading/trailing whitespace. Variable fast-path reuses the markup string directly when no trimming is needed (avoids byteslice). blank_string? uses match? regex instead of byte loop.

Render optimizations (~22% faster, ~3K fewer allocs)

Splat-free filter invocation. Filters without arguments (| escape, | strip_html — 75% of all filter calls) now use invoke_single(method, input) which avoids the *args array allocation. Single-arg filters use invoke_two. Only 59 calls per render still need the splat path.

Primitive type fast paths. find_variable returns immediately for String, Integer, Float, Array, Hash, nil, true, false — skipping to_liquid (which returns self for all of these) and respond_to?(:context=) checks. Same optimization in VariableLookup#evaluate for hash key lookups and result handling. to_liquid_value skipped for String/Integer keys.

Hash fast-path in VariableLookup. instance_of?(Hash) check before the general respond_to?(:[]) / respond_to?(:key?) chain — hashes are the most common lookup target.

Context#find_variable optimizations. Top-scope fast path (most common in for loops). Single-scope shortcut — when only one scope exists, skip find_index and go straight to environments.

Cached small integer to_s. Utils.to_s returns pre-computed frozen strings for integers 0-999, avoiding 267 Integer#to_s allocations per render cycle.

Lazy initialization. Context defers StringScanner and @interrupts array creation until actually needed. Registers defers @changes hash. static_environments uses EMPTY_ARRAY when empty. block_delimiter strings cached per tag name.

Utils.to_s / Utils.inspect lazy seen hash. The seen = {} default parameter allocated a hash on every call even though the recursive-structure guard is almost never triggered. Changed to seen = nil with seen || {} only when entering Hash/Array branches.

Utils.slice_collection fast path. When from == 0, to.nil?, and collection is already an Array, returns it directly instead of copying through slice_collection_using_each.

Code removed / simplified

The Cursor consolidation deleted ~75 lines of duplicated byte-scanning logic. Methods that previously had 20+ lines of manual getbyte/scan_byte loops are now 1-3 line regex delegations. Examples:

# Before: 15 lines of manual byte scanning
def scan_id
  start = @ss.pos
  b = @ss.peek_byte
  return unless b && ((b >= 97 && b <= 122) || (b >= 65 && b <= 90) || b == USCORE)
  @ss.scan_byte
  while (b = @ss.peek_byte)
    break unless (b >= 97 && b <= 122) || ...
    @ss.scan_byte
  end
  @ss.scan_byte if @ss.peek_byte == QMARK
  @source.byteslice(start, @ss.pos - start)
end

# After: C-level regex is 2-3x faster
ID_REGEX = /[a-zA-Z_][\w-]*\??/
def scan_id = @ss.scan(ID_REGEX)

What did NOT work (reverted experiments)

Lexer output caching. 93% cache hit rate across templates, but the Parser's expression method mutates token strings in-place via str << variable_lookups. Cached tokens get corrupted. Would need frozen tokens + dup-on-mutate, which adds more allocs than it saves.
Shared expression cache across templates. Only 70 unique expressions across all templates, but a global cache leaks state between parses and grows unboundedly. Per-template caches are the right tradeoff.
Whitespace trimming in parse_variable_token. Saves downstream byteslice allocs but changes error message content (markup_context uses the trimmed string).
Manual truncatewords. Byte-level word scanning to avoid String#split — creates more allocs from per-word byteslice than split does internally.
case/when type dispatch in Context#evaluate. YJIT already optimizes respond_to? well — the case/when adds overhead from type checking.

Benchmark reproduction

cd performance
bundle exec ruby bench_quick.rb   # single run
# or
./auto/autoresearch.sh            # tests + 3-run best-of

The benchmark uses ThemeRunner which parses/renders 4 real Shopify themes (dropify, ripen, tribble, vogue) with production-like database fixtures. YJIT is enabled. GC is disabled during measurement windows. Times are Process.clock_gettime(CLOCK_MONOTONIC) wall-clock, allocations via ObjectSpace.count_objects.

Files changed

lib/liquid/cursor.rb — new Cursor class (StringScanner wrapper with regex-based Liquid-specific methods)
lib/liquid/block_body.rb — tag/variable token parsing via Cursor, regex blank_string?
lib/liquid/variable.rb — try_fast_parse byte-level name+filter scanner with multi-arg support, cached no-arg filter tuples, invoke_single/invoke_two render dispatch
lib/liquid/variable_lookup.rb — simple_lookup? byte validator, parse_simple fast path, primitive type fast paths in evaluate
lib/liquid/expression.rb — byte-level parse_number, conditional strip, byteslice for string literals
lib/liquid/context.rb — invoke_single/invoke_two, find_variable primitive fast paths + single-scope shortcut, lazy init, frozen defaults
lib/liquid/strainer_template.rb — invoke_single/invoke_two dispatch methods
lib/liquid/tags/if.rb — Cursor-based simple condition parsing
lib/liquid/tags/for.rb — Cursor-based lax_parse with zero-alloc skip_id/expect_id
lib/liquid/block.rb — cached block_delimiter strings
lib/liquid/registers.rb — lazy @changes hash
lib/liquid/standardfilters.rb — allocation-optimized truncatewords
lib/liquid/lexer.rb — \s+ instead of \s* for whitespace skip
lib/liquid/utils.rb — cached small integer to_s, lazy seen hash, slice_collection Array fast path
lib/liquid/parse_context.rb — Cursor instance, attr_reader for expression_cache/string_scanner
lib/liquid/resource_limits.rb — expose last_capture_length for render loop optimization

…Lookup

…te_variable

… \s+

… for string literals

…or common case

…cket follows

…parated lookups

… single conditions

…ter chains without full Lexer pass for name

… filter args without colon

…g filters

…tespace string allocs

…e when no limits active

…fast-pathed)

… iterations

… when args present

…rchitecture

…identifiers without Lexer/Parser\n\nResult: {"status":"keep","combined_µs":4427,"parse_µs":3181,"render_µs":1246,"allocations":27235}

…tespace, no filters)\n\nResult: {"status":"keep","combined_µs":4277,"parse_µs":3057,"render_µs":1220,"allocations":27026}

…ilters (e.g. pluralize: 'item', 'items')\n\nResult: {"status":"keep","combined_µs":4266,"parse_µs":3032,"render_µs":1234,"allocations":26480}

…ray + string allocations\n\nResult: {"status":"keep","combined_µs":4280,"parse_µs":3009,"render_µs":1271,"allocations":26395}

… per render cycle\n\nResult: {"status":"keep","combined_µs":4158,"parse_µs":2920,"render_µs":1238,"allocations":26128}

…on until needed\n\nResult: {"status":"keep","combined_µs":4299,"parse_µs":3057,"render_µs":1242,"allocations":26015}

…nterpolation\n\nResult: {"status":"keep","combined_µs":4372,"parse_µs":3127,"render_µs":1245,"allocations":25605}

…tually written\n\nResult: {"status":"keep","combined_µs":4287,"parse_µs":3059,"render_µs":1228,"allocations":25595}

…array allocs per render cycle\n\nResult: {"status":"keep","combined_µs":4262,"parse_µs":3079,"render_µs":1183,"allocations":25535}

…oids method lookup overhead\n\nResult: {"status":"keep","combined_µs":4207,"parse_µs":2943,"render_µs":1264,"allocations":25535}

… environments\n\nResult: {"status":"keep","combined_µs":4323,"parse_µs":3055,"render_µs":1268,"allocations":25535}

… respond_to?(:context=)\n\nResult: {"status":"keep","combined_µs":4225,"parse_µs":3009,"render_µs":1216,"allocations":25535}

…Result: {"status":"keep","combined_µs":4334,"parse_µs":3062,"render_µs":1272,"allocations":25535}

…checks for Hash objects\n\nResult: {"status":"keep","combined_µs":4110,"parse_µs":2922,"render_µs":1188,"allocations":25535}

…Scanner.scan is faster than Ruby-level byte scanning\n\nResult: {"status":"keep","combined_µs":4185,"parse_µs":2943,"render_µs":1242,"allocations":25535}

… performance\n\nResult: {"status":"keep","combined_µs":4184,"parse_µs":2931,"render_µs":1253,"allocations":25535}

…h regex — cleaner, same/better perf\n\nResult: {"status":"keep","combined_µs":4132,"parse_µs":2890,"render_µs":1242,"allocations":25535}

…eslice allocation for op strings\n\nResult: {"status":"keep","combined_µs":4007,"parse_µs":2808,"render_µs":1199,"allocations":25535}

…status":"keep","combined_µs":4047,"parse_µs":2795,"render_µs":1252,"allocations":25535}

…: {"status":"keep","combined_µs":4102,"parse_µs":2849,"render_µs":1253,"allocations":25535}

…"combined_µs":4121,"parse_µs":2812,"render_µs":1309,"allocations":25535}

…nResult: {"status":"keep","combined_µs":4184,"parse_µs":2921,"render_µs":1263,"allocations":25535}

…ds respond_to? dispatch\n\nResult: {"status":"keep","combined_µs":4131,"parse_µs":2893,"render_µs":1238,"allocations":25535}

…t: {"status":"keep","combined_µs":4196,"parse_µs":3042,"render_µs":1154,"allocations":25535}

…across templates\n\nResult: {"status":"keep","combined_µs":4147,"parse_µs":2992,"render_µs":1155,"allocations":24881}

…ds respond_to? dispatch\n\nResult: {"status":"keep","combined_µs":4103,"parse_µs":2881,"render_µs":1222,"allocations":24881}

tobi added 30 commits March 11, 2026 07:10

add quick benchmark script for autoresearch

4ea835a

replace FullToken regex with manual byte parsing in parse_for_document

3329b09

replace VariableParser regex scan with manual byte parser in Variable…

97e6893

…Lookup

add auto/bench.sh: unit tests + liquid-spec + perf benchmark

7aded8e

use getbyte instead of string indexing in whitespace_handler and crea…

2b78e4b

…te_variable

use equal? for frozen array comparison in Lexer, skip whitespace with…

d291e63

… \s+

avoid unnecessary strip allocation in Expression.parse, use byteslice…

d79b9fa

… for string literals

short-circuit parse_number with first-byte check before regex

fa41224

fast-path String in render_obj_to_output, avoid Utils.to_s dispatch f…

c1113ad

…or common case

fast-path variable_lookups: skip mutable string alloc when no dot/bra…

1a79cf6

…cket follows

use frozen EMPTY_ARRAY for Variable filters when no filters present

5da2232

fast-path simple variable parsing: skip Lexer/Parser for plain dot-se…

25f9224

…parated lookups

replace SIMPLE_VARIABLE regex with byte-level scanner to avoid MatchData

3939d74

fast-path simple if conditions: skip ExpressionsAndOperators scan for…

fe7a2f5

… single conditions

skip TagAttributes scan in for tag when no colon present

6bcc293

fast-path render for filter-less variables: skip render method overhead

f8b0156

unified fast-path Variable parsing: handle both plain lookups and fil…

8a92a4e

…ter chains without full Lexer pass for name

expose expression_cache/string_scanner via attr_reader, skip regex in…

2d3b856

… filter args without colon

replace For tag Syntax regex with manual byte-level parser

cfa0dfe

avoid empty array allocation in evaluate_filter_expressions for no-ar…

544d8f1

…g filters

use getbyte dispatch instead of start_with? in parse_for_document

8240709

return [tag_name, markup, newlines] from parse_tag_token: avoid 2 whi…

58d2514

…tespace string allocs

use frozen EMPTY_ARRAY for disabled_tags in Variable

b86143e

hoist write score check out of render loop: skip increment_write_scor…

db43492

…e when no limits active

skip filter arg splat for no-arg filters, trim render loop comments

283961d

extend fast-path to handle quoted string literal variables (262 more …

17daac9

…fast-pathed)

autoresearch: add autoresearch.md/sh, increase benchmark warmup to 20…

2543fdc

… iterations

split filter parsing: scan no-arg filters directly, only invoke Lexer…

9fd7cec

… when args present

add security constraint to autoresearch.md, fix strict mode gate

ad98d1f

autoresearch.md: add strategic direction toward single-pass scanner a…

83037f9

…rchitecture

tobi added 28 commits March 11, 2026 09:53

fix rubocop offenses: autocorrect style/layout violations

18a72db

Fast-path single-arg filter parsing: handle quoted strings, numbers, …

a249010

…identifiers without Lexer/Parser\n\nResult: {"status":"keep","combined_µs":4427,"parse_µs":3181,"render_µs":1246,"allocations":27235}

Avoid expr_markup byteslice when name is entire markup string (no whi…

c252d50

…tespace, no filters)\n\nResult: {"status":"keep","combined_µs":4277,"parse_µs":3057,"render_µs":1220,"allocations":27026}

Extend fast-path filter parsing to handle comma-separated multi-arg f…

6723d4f

…ilters (e.g. pluralize: 'item', 'items')\n\nResult: {"status":"keep","combined_µs":4266,"parse_µs":3032,"render_µs":1234,"allocations":26480}

Replace split+join in truncatewords with manual word scan — avoids ar…

b48615f

…ray + string allocations\n\nResult: {"status":"keep","combined_µs":4280,"parse_µs":3009,"render_µs":1271,"allocations":26395}

Cache small integer to_s (0-999): avoids 267 Integer#to_s allocations…

99e55c2

… per render cycle\n\nResult: {"status":"keep","combined_µs":4158,"parse_µs":2920,"render_µs":1238,"allocations":26128}

Lazy Context init: defer StringScanner and @interrupts array allocati…

9af3ba3

…on until needed\n\nResult: {"status":"keep","combined_µs":4299,"parse_µs":3057,"render_µs":1242,"allocations":26015}

Cache block_delimiter strings per tag name — avoids repeated string i…

e3fc735

…nterpolation\n\nResult: {"status":"keep","combined_µs":4372,"parse_µs":3127,"render_µs":1245,"allocations":25605}

Lazy @changes hash in Registers — only allocate when a register is ac…

cd308b8

…tually written\n\nResult: {"status":"keep","combined_µs":4287,"parse_µs":3059,"render_µs":1228,"allocations":25595}

Use EMPTY_ARRAY for empty static_environments in Context — avoids 60 …

9e29379

…array allocs per render cycle\n\nResult: {"status":"keep","combined_µs":4262,"parse_µs":3079,"render_µs":1183,"allocations":25535}

Skip respond_to?(:context=) for primitive types in find_variable — av…

c4593ce

…oids method lookup overhead\n\nResult: {"status":"keep","combined_µs":4207,"parse_µs":2943,"render_µs":1264,"allocations":25535}

Skip find_index when only one scope in find_variable — go straight to…

0e84955

… environments\n\nResult: {"status":"keep","combined_µs":4323,"parse_µs":3055,"render_µs":1268,"allocations":25535}

Fast return for primitive types in find_variable — skip to_liquid and…

94562ea

… respond_to?(:context=)\n\nResult: {"status":"keep","combined_µs":4225,"parse_µs":3009,"render_µs":1216,"allocations":25535}

Skip to_liquid/context= for primitives in VariableLookup#evaluate\n\n…

b058f79

…Result: {"status":"keep","combined_µs":4334,"parse_µs":3062,"render_µs":1272,"allocations":25535}

Fast-path Hash lookups in VariableLookup#evaluate — skip respond_to? …

4df608a

…checks for Hash objects\n\nResult: {"status":"keep","combined_µs":4110,"parse_µs":2922,"render_µs":1188,"allocations":25535}

Replace manual byte-level scan_id/skip_id with regex — C-level String…

ecc2318

…Scanner.scan is faster than Ruby-level byte scanning\n\nResult: {"status":"keep","combined_µs":4185,"parse_µs":2943,"render_µs":1242,"allocations":25535}

Replace manual byte-level scan_number with regex — cleaner code, same…

6db20e9

… performance\n\nResult: {"status":"keep","combined_µs":4184,"parse_µs":2931,"render_µs":1253,"allocations":25535}

Replace manual scan_fragment/scan_quoted_string_raw/skip_fragment wit…

f8b08b5

…h regex — cleaner, same/better perf\n\nResult: {"status":"keep","combined_µs":4132,"parse_µs":2890,"render_µs":1242,"allocations":25535}

Replace manual scan_comparison_op with regex — cleaner and avoids byt…

11c22eb

…eslice allocation for op strings\n\nResult: {"status":"keep","combined_µs":4007,"parse_µs":2808,"render_µs":1199,"allocations":25535}

Replace manual rest_blank? with regex skip + eos? check\n\nResult: {"…

e15b163

…status":"keep","combined_µs":4047,"parse_µs":2795,"render_µs":1252,"allocations":25535}

Replace manual scan_quoted_string with regex capture groups\n\nResult…

fd4a7af

…: {"status":"keep","combined_µs":4102,"parse_µs":2849,"render_µs":1253,"allocations":25535}

Replace manual scan_dotted_id with regex\n\nResult: {"status":"keep",…

71e22e6

…"combined_µs":4121,"parse_µs":2812,"render_µs":1309,"allocations":25535}

Minor cleanup: optimize expect_id with while loop and early return\n\…

1a01915

…nResult: {"status":"keep","combined_µs":4184,"parse_µs":2921,"render_µs":1263,"allocations":25535}

Skip to_liquid_value for String/Integer keys in VariableLookup — avoi…

22b5ff1

…ds respond_to? dispatch\n\nResult: {"status":"keep","combined_µs":4131,"parse_µs":2893,"render_µs":1238,"allocations":25535}

Replace manual blank_string? with regex match — cleaner code\n\nResul…

76afdf1

…t: {"status":"keep","combined_µs":4196,"parse_µs":3042,"render_µs":1154,"allocations":25535}

Cache no-arg filter tuples [name, EMPTY_ARRAY] — reuse frozen tuples …

228ecdb

…across templates\n\nResult: {"status":"keep","combined_µs":4147,"parse_µs":2992,"render_µs":1155,"allocations":24881}

update autoresearch.md with current progress

38d8055

Skip context.evaluate for String lookup keys in VariableLookup — avoi…

8f2f0ee

…ds respond_to? dispatch\n\nResult: {"status":"keep","combined_µs":4103,"parse_µs":2881,"render_µs":1222,"allocations":24881}

tobi changed the title ~~Performance: 35% faster parse+render, 53% fewer allocations~~ Performance: 47% faster parse+render, 60% fewer allocations Mar 11, 2026

tobi requested a review from ianks March 11, 2026 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: 47% faster parse+render, 60% fewer allocations#2056

Performance: 47% faster parse+render, 60% fewer allocations#2056
tobi wants to merge 81 commits intomainfrom
autoresearch/liquid-perf-2026-03-11

tobi commented Mar 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tobi commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Methodology

Architecture: the Cursor class

What changed (by impact)

Parse optimizations (~53% faster, ~38K fewer allocs)

Render optimizations (~22% faster, ~3K fewer allocs)

Code removed / simplified

What did NOT work (reverted experiments)

Benchmark reproduction

Files changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tobi commented Mar 11, 2026 •

edited

Loading