Skip to content

Major lljson rework to make round-trippable serialization easier#61

Open
HaroldCindy wants to merge 2 commits intomainfrom
harold/add_json_replacer
Open

Major lljson rework to make round-trippable serialization easier#61
HaroldCindy wants to merge 2 commits intomainfrom
harold/add_json_replacer

Conversation

@HaroldCindy
Copy link
Contributor

Of note, there's now a notion of replacer and reviver callbacks as in JS' JSON APIs. An example of their use is in the tests folder as lljson_typedjson.lua.

We went with this form since it allows constructing a different representation of the object before serializing without requiring you to construct an entire, serializable copy before calling lljson.encode(). That allows you to save memory, since the serializable version of each object only need to be alive as long as we're still traversing the object.

Additionally, an empty table is now encoded as [] by default. This is probably the most common meaning for an empty table, but you can also apply object_mt as a metatable or add __jsontype="object" to your own metatable to force serialization as an object. Similarly, array_mt or __jsontype="array" will act as a hint to treat your object as an array.

__len should no longer be used as a hint that the object should be treated as an array, that's what __jsontype is for.

Also added a new options table format to lljson.encode() and friends. The table now allows you to specify that __tojson hooks should be skipped, so you can manually invoke them at your leisure in your replacer hooks.

Of note, there's now a notion of replacer and reviver callbacks as in
JS' `JSON` APIs. An example of their use is in the tests folder as
`lljson_typedjson.lua`.

We went with this form since it allows constructing a different representation
of the object before serializing without requiring you to construct an entire,
serializable copy before calling `lljson.encode()`. That allows you to save memory,
since the serializable version of each object only need to be alive as long
as we're still traversing the object.

Additionally, an empty table is now encoded as `[]` by default. This is
probably the most common meaning for an empty table, but you can also
apply `object_mt` as a metatable or add `__jsontype="object"` to your
own metatable to force serialization as an object. Similarly, `array_mt`
or `__jsontype="array"` will act as a hint to treat your object as an array.

`__len` should no longer be used as a hint that the object should be treated
as an array, that's what `__jsontype` is for.

Also added a new options table format to `lljson.encode()` and friends. The
table now allows you to specify that `__tojson` hooks should be skipped, so
you can manually invoke them at your leisure in your replacer hooks.
// Helper to append a tagged vector value: !v<x,y,z> or tight: !v1,2,3
// Helper to append a tagged vector value: !v<x,y,z> or tight: !v1,,3
// ZERO_VECTOR in tight mode -> "!v"
static void json_append_tagged_vector(lua_State *l, strbuf_t *json, const float *a, bool tight = false) {
Copy link

@tapple tapple Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should result in an error, but it doesn't:

> lljson.sldecode('"!v1,,1,,"')
<1, 0, 1>

Copy link

@tapple tapple Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this too. Less sure:

> lljson.sldecode('"!v    000000000000000003.14,,"')
<3.14, 0, 0>

Probably not:

> tonumber("    000000000000000003.14")
3.14

// ZERO_ROTATION (0,0,0,1) in tight mode -> "!q"
static void json_append_tagged_quaternion(lua_State *l, strbuf_t *json, const float *a, bool tight = false) {
strbuf_append_string(json, tight ? "\"!q" : "\"!q<");
if (tight && memcmp(a, DEFAULT_QUATERNION, sizeof(DEFAULT_QUATERNION)) == 0) {
Copy link

@tapple tapple Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should generate an error, but doesn't:

> lljson.sldecode('"!q2e3,,0x16,,xyzzz"')
<2000, 0, 22, 0>

if (max > items * JSON_SPARSE_RATIO &&
max > JSON_SPARSE_SAFE) {
if (!cfg->encode_sparse_convert)
json_encode_exception(l, cfg, json, -1, "excessively sparse array");
Copy link

@tapple tapple Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dunno if this is meant to be fixed:

> lljson.encode({[10]=4})
[null,null,null,null,null,null,null,null,null,4]
> lljson.encode({[11]=4})
stdin:1: Cannot serialise table: excessively sparse array
stack backtrace:
[C] function encode
stdin:1

> lljson.encode({[11.1]=4})
{"11.1":4}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be better with the latest commit, a __jsontype="array" forces it to do array encoding even if it's technically sparse, and you have the option to just allow it wholesale via an option a52935a .

I thought about adding an option to silently convert them to objects as well, but that seems like something you'd rarely want. If you don't care about the object / array distinction, then you're probably best off using slencode() which doesn't try to maintain that distinction at the serialization layer.

Copy link

@tapple tapple Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I find either of these acceptable:

> lljson.encode(setmetatable({[11]=4}, lljson.object_mt))
{"11":4}
> lljson.encode({["11"]=4})
{"11":4}
> lljson.encode({[11]=4}, {allow_sparse=true})
[null,null,null,null,null,null,null,null,null,null,4]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suzanna-Linn Any thoughts on this approach? Tapple's comment shows the current behavior as of this PR, with the addition of the following:

> lljson.encode(setmetatable({[11]=4}, lljson.array_mt))
[null,null,null,null,null,null,null,null,null,null,4]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. With the new allow_sparse, we have enough options to shape the table in any way.

I would rather convert to an object instead of throwing the "excessively sparse array" error, but having to explicitly say how we want the table is good too, to avoid a table being encoded sometimes as an array and sometimes as an object depending on its contents.

lua_pushvalue(l, (int)DecodeStack::REVIVER);
lua_pushinteger(l, i); // key (1-based source index)
lua_pushvalue(l, -3); // value
lua_pushvalue(l, -5); // parent (the array table)
Copy link

@tapple tapple Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be handy if the reviver received more context about what it's parsing, like, full path to the current value (in the style of llJsonGetValue)

something like this:

lljson.decode('{"apples": [{}, {}, {}]},"oranges":[{},{}]}', function(path, value)
    if path[#path-1] == "apples" then return setmetatable(value, Apple)
    elseif path[#path-1] == "oranges" then return setmetatable(value, Orange)
    else return value
end)

resulting in:

{
    apples={
        setmetatable({}, Apple),
        setmetatable({}, Apple),
        setmetatable({}, Apple),
    },
    oranges={
        setmetatable({}, Orange),
        setmetatable({}, Orange),
    },
}

I dunno if that's too much work to put into this. It can be done already in 2 ways:

  1. post-process after decode runs
  2. post-process during revival of the element who's keys are "apples" and "oranges", by looping over it's children

Also, my example is not a great data structure. a tagged one like TypedJSON generates is more "proper" json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add a path tracking system because it can be done pretty trivially by having the replacer / reviver close over a tracking table that the replacers and revivers themselves update as they visit things, and using parent to figure out what path key and value are relative to. That's a pretty reasonable use-case though.

Copy link

@tapple tapple Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm failing to think of a way to accomplish the above example without a for-loop, especially for revival, where the leaves are visited before the parents, so you can't really know the leaf address while it's being visited

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, yeah I forgot about visitation order being depth-first, leaf-first for the reviver....

I'm kind of torn here. Most (de)serializers don't provide a path and need a post-processing step for cases where position within the JSON document determines how something should be treated rather than some property of the data itself. It's not an unreasonable use-case though, so I'll see how complicated it would be to add optional path tracking, so it doesn't impact the common case where the bookkeeping overhead would be unnecessary.

@HaroldCindy HaroldCindy marked this pull request as ready for review March 16, 2026 23:08
} else if (keytype == LUA_TSTRING) {
json_append_string(l, json, -2);
strbuf_append_char(json, ':');
} else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it convert uuid's as string?

uuid's are often used as keys in tables of visitors, subscribers, tippers, buyers... which can be exported to external resources.

It could also convert anything to string, as slencode() does, but it will not be so useful.

Copy link

@tapple tapple Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a replacer/reviver could convert uuid's to something else. I'm quite happy with the default behavior of lljson with uuid's now, and replacers/revivers allow further easy customization:

string to uuid:

> us = "12345678-9012-3456-7890-123456789012"
> lljson.slencode({1, us, 2}, {tight=true})
[1,"12345678-9012-3456-7890-123456789012",2]
> lljson.slencode({1, us, 2}, {replacer=function(k, v) return touuid(v) or v end, tight=true})
[1,"!uEjRWeJASNFZ4kBI0VniQEg",2]

uuid to string:

> u = uuid(us)
> lljson.slencode({1, u, 2}, {tight=true})
[1,"!uEjRWeJASNFZ4kBI0VniQEg",2]
> lljson.slencode({1, u, 2}, {replacer=function(k, v) return if typeof(v) == "uuid" then tostring(v) else v end, tight=true})
[1,"12345678-9012-3456-7890-123456789012",2]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting UUIDs seems reasonable to me!

Copy link

@tapple tapple Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "converting uuid's" mean? make one or both of these return "uuid" rather than "string"?

> print(typeof(lljson.decode('"12345678-9012-3456-7890-123456789012"')))
string
> print(typeof(lljson.sldecode('"12345678-9012-3456-7890-123456789012"')))
string

without a reviver? I don't have a strong objection to it for lljson.decode. But I'd rather lljson.sldecode did not do that. If I put a uuid as a string into lljson.slencode, I had to intentionally do that and probably had a good reason too. Not that I can think why I'd do that.

Still, it's a really easy reviver to write. I don't feel like much is gained nor lost either way:

> j = '[4, "12345678-9012-3456-7890-123456789012", "not uuid"]'
> function reviver(k, v) return touuid(v) or v end
> for k, v in lljson.decode(j, reviver) do print(k, typeof(v), v) end
1       number  4
2       uuid    12345678-9012-3456-7890-123456789012
3       string  not uuid

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My request is only for encode() to convert uuid keys to string, instead of throwing a "table key must be a number or string" error.

It would be useful to export, for external storage, tables of visitors, subscribers, tippers, buyers, etc., that have the userId as the key.

replacer doesn't change individual keys, only their values. Currently, we need to provide a new table with __tojson or replacer, which is a memory problem since these tables can be big.

I'm happy with how slencode() / sldecode() deal with uuid, and with decode() decoding the JSON types as they are without trying to "interpret" them.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. keys:

> lljson.encode({[uuid("12345678-9012-3456-7890-123456789012")]= "yo"})
stdin:1: Cannot serialise userdata: table key must be a number or string
stack backtrace:
[C] function encode
stdin:1

Yeah. that would be great. no objection

-- by setting the array_mt metatable
assert(lljson.encode(setmetatable({}, lljson.array_mt)) == "[]")
-- `lljson.empty_array` is the same.
-- Sentinel frozen tables for empty array/object
Copy link

@monty-linden monty-linden Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At l8, now you need the opposite test - you can make it an object.

if type(value) == "table" then
local mt = getmetatable(value)
if mt then
-- Call __tojson manually on the value, if we have one.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it call __tojson also with tj:encode(myTab, { skip_tojson = true }) ?

Copy link

@tapple tapple Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think harold implemented encode options in this TypedJSON wrapper:

> TypedJSON = require("./TypedJSON"); Vec2 = require("./Vec2"); Player = require("./Player"); Color = require("./Color")
> tj = TypedJSON.new({ Vec2 = Vec2, Player = Player, [3] = Color })
> tojson = { __tojson = function(self, ...) print("tojson", self, ...); return self end }
> lljson.encode(setmetatable({}, tojson))
tojson  table: 0x0000000039798a00       table: 0x00000000397989c8
[]
> lljson.encode(setmetatable({}, tojson), {skip_tojson=true})
[]
> lljson.slencode(setmetatable({}, tojson))
tojson  table: 0x0000000039854e88       table: 0x0000000039854e50
[]
> lljson.slencode(setmetatable({}, tojson), {skip_tojson=true})
[]
> tj:encode(setmetatable({}, tojson))
tojson  table: 0x0000000039854cc8
[]
> tj:encode(setmetatable({}, tojson), {skip_tojson=true})
tojson  table: 0x0000000039854be8
[]
> tj:slencode(setmetatable({}, tojson))
tojson  table: 0x0000000039854b08
[]
> tj:slencode(setmetatable({}, tojson), {skip_tojson=true})
tojson  table: 0x0000000039854a28
[]

It doesn't even take a second argument: https://github.com/secondlife/slua/pull/61/changes/BASE..222e486a4b6a53c70d164ee6ec2f6c25e701b189#diff-63bfb54906b6f05adc459307c72ee6eef607d41c6674ccd28007eb75a1769e8dR137 . I guess it could have taken a second argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, skip_json is (ab)used by TypedJSON strictly to allow it to call __tojson itself in the replacer, not to actually skip calling __tojson. TypedJSON really is just a trivial example of how you could write such a thing to prove that the API is workable for common serialization situations, it's not necessarily intended to be used by anything.

static void json_next_number_token(json_parse_t *json, json_token_t *token)
{
char *endptr;
long long tmpval = strtoll(json->ptr, &endptr, 10);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to the current improvement, it happens in the version in-world:

print(lljson.decode("100000000000000000000"))  -- 1e20
-- > 9223372036854776000

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still happens:

> lljson.decode("100000000000000000000")
9223372036854776000

and it doesn't seem the native integer cuttoff is quite that low:

> 100000000000000000000
100000000000000000000
> lljson.decode("100000000000000000000") == 100000000000000000000
false

@@ -1 +1 @@
/* Lua CJSON - JSON support for Lua
Copy link

@Suzanna-Linn Suzanna-Linn Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm learning how it works, is it like this? in the most metatably case?

table1
if replacer then
table2 = table1.__tojson
table3 = replacer
end
shape = table3.__jsontype
table4 = table3.__tojson
if array then
len = table4.__len
end

  • A table can be replaced 3 times (__tojson called in replacer, the replacer and __tojson in serialization).
  • If replacer returns the same table, its __tojson is called twice (in replacer and in serialization).
  • In serialization, __tojson returning a table with __jsontype has no effect.

Copy link

@tapple tapple Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like you are correct. __tojson can be called both before and after the replacer:

> tojson = { __tojson = function(self, ...) print("tojson", self, ...); return self end }
> function replacer(...) print("replacer", ...); return setmetatable({}, tojson) end
> lljson.encode(setmetatable({}, tojson), {replacer=replacer})
tojson  table: 0x00000000398548a0       table: 0x0000000039854830
replacer        nil     table: 0x00000000398548a0       nil
tojson  table: 0x0000000039854788       table: 0x0000000039854830
[]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like __jsontype is indeed ignored on tables returned from __tojson:

> lljson.encode(setmetatable({}, {__jsontype="object"}))
{}
> lljson.encode(setmetatable({}, {__tojson = function() return setmetatable({}, {__jsontype="object"}) end }))
[]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, __jsontype is what determines the encoding semantic for the object, __tojson() is a hook that objects can use to present their own view of the data to the serializer. Will look at that double __tojson call though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding __jsontype and __tojson, a theoretical (and probably non-existent case) would be a __tojson that returns, depending on the data, any one of:

  • a JSON object
  • a JSON array
  • an SLua array to be encoded as a JSON object

Then, we would need to set __jsontype in __tojson and use a replacer that does nothing ( replacer = function(_, v) return v end ) to get __tojson called and __jsontype set before it is used.

char *end;
double num = fpconv_strtod(payload, &end);
if (end == payload)
luaL_error(l, "malformed tagged float: %s", str);
Copy link

@tapple tapple Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should give an error, but doesn't:

> lljson.sldecode('"!f    000000000000000003.14$$$$$$$$"')
3.14
> tonumber("    000000000000000003.14$$$$$$$$")
nil
> tonumber("    000000000000000003.14")
3.14

lua_pushvalue(l, -2); // self
lua_pushvalue(l, (int)EncodeStack::CTX); // ctx
YIELD_CHECK(l, TOJSON_CHECK, LUA_INTERRUPT_LLLIB);
YIELD_CALL(l, 2, 1, TOJSON_CALL);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what happens if __tojson returns a table that also has a __tojson.

Should the new __tojson be called too, until no more __tojson are returned?

Otherwise, unless replacer replaces the table again, will the changes made by replacer be overwritten when __tojson is called later?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My __tojson example above returns self, which has a __tojson. and it only ran __tojson once

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants