Major lljson rework to make round-trippable serialization easier#61
Major lljson rework to make round-trippable serialization easier#61HaroldCindy wants to merge 2 commits intomainfrom
Conversation
Of note, there's now a notion of replacer and reviver callbacks as in JS' `JSON` APIs. An example of their use is in the tests folder as `lljson_typedjson.lua`. We went with this form since it allows constructing a different representation of the object before serializing without requiring you to construct an entire, serializable copy before calling `lljson.encode()`. That allows you to save memory, since the serializable version of each object only need to be alive as long as we're still traversing the object. Additionally, an empty table is now encoded as `[]` by default. This is probably the most common meaning for an empty table, but you can also apply `object_mt` as a metatable or add `__jsontype="object"` to your own metatable to force serialization as an object. Similarly, `array_mt` or `__jsontype="array"` will act as a hint to treat your object as an array. `__len` should no longer be used as a hint that the object should be treated as an array, that's what `__jsontype` is for. Also added a new options table format to `lljson.encode()` and friends. The table now allows you to specify that `__tojson` hooks should be skipped, so you can manually invoke them at your leisure in your replacer hooks.
| // Helper to append a tagged vector value: !v<x,y,z> or tight: !v1,2,3 | ||
| // Helper to append a tagged vector value: !v<x,y,z> or tight: !v1,,3 | ||
| // ZERO_VECTOR in tight mode -> "!v" | ||
| static void json_append_tagged_vector(lua_State *l, strbuf_t *json, const float *a, bool tight = false) { |
There was a problem hiding this comment.
I think this should result in an error, but it doesn't:
> lljson.sldecode('"!v1,,1,,"')
<1, 0, 1>There was a problem hiding this comment.
Maybe this too. Less sure:
> lljson.sldecode('"!v 000000000000000003.14,,"')
<3.14, 0, 0>
Probably not:
> tonumber(" 000000000000000003.14")
3.14
| // ZERO_ROTATION (0,0,0,1) in tight mode -> "!q" | ||
| static void json_append_tagged_quaternion(lua_State *l, strbuf_t *json, const float *a, bool tight = false) { | ||
| strbuf_append_string(json, tight ? "\"!q" : "\"!q<"); | ||
| if (tight && memcmp(a, DEFAULT_QUATERNION, sizeof(DEFAULT_QUATERNION)) == 0) { |
There was a problem hiding this comment.
I think this should generate an error, but doesn't:
> lljson.sldecode('"!q2e3,,0x16,,xyzzz"')
<2000, 0, 22, 0>| if (max > items * JSON_SPARSE_RATIO && | ||
| max > JSON_SPARSE_SAFE) { | ||
| if (!cfg->encode_sparse_convert) | ||
| json_encode_exception(l, cfg, json, -1, "excessively sparse array"); |
There was a problem hiding this comment.
dunno if this is meant to be fixed:
> lljson.encode({[10]=4})
[null,null,null,null,null,null,null,null,null,4]
> lljson.encode({[11]=4})
stdin:1: Cannot serialise table: excessively sparse array
stack backtrace:
[C] function encode
stdin:1
> lljson.encode({[11.1]=4})
{"11.1":4}There was a problem hiding this comment.
Should be better with the latest commit, a __jsontype="array" forces it to do array encoding even if it's technically sparse, and you have the option to just allow it wholesale via an option a52935a .
I thought about adding an option to silently convert them to objects as well, but that seems like something you'd rarely want. If you don't care about the object / array distinction, then you're probably best off using slencode() which doesn't try to maintain that distinction at the serialization layer.
There was a problem hiding this comment.
yeah, I find either of these acceptable:
> lljson.encode(setmetatable({[11]=4}, lljson.object_mt))
{"11":4}
> lljson.encode({["11"]=4})
{"11":4}
> lljson.encode({[11]=4}, {allow_sparse=true})
[null,null,null,null,null,null,null,null,null,null,4]There was a problem hiding this comment.
@Suzanna-Linn Any thoughts on this approach? Tapple's comment shows the current behavior as of this PR, with the addition of the following:
> lljson.encode(setmetatable({[11]=4}, lljson.array_mt))
[null,null,null,null,null,null,null,null,null,null,4]There was a problem hiding this comment.
I like it. With the new allow_sparse, we have enough options to shape the table in any way.
I would rather convert to an object instead of throwing the "excessively sparse array" error, but having to explicitly say how we want the table is good too, to avoid a table being encoded sometimes as an array and sometimes as an object depending on its contents.
| lua_pushvalue(l, (int)DecodeStack::REVIVER); | ||
| lua_pushinteger(l, i); // key (1-based source index) | ||
| lua_pushvalue(l, -3); // value | ||
| lua_pushvalue(l, -5); // parent (the array table) |
There was a problem hiding this comment.
it might be handy if the reviver received more context about what it's parsing, like, full path to the current value (in the style of llJsonGetValue)
something like this:
lljson.decode('{"apples": [{}, {}, {}]},"oranges":[{},{}]}', function(path, value)
if path[#path-1] == "apples" then return setmetatable(value, Apple)
elseif path[#path-1] == "oranges" then return setmetatable(value, Orange)
else return value
end)resulting in:
{
apples={
setmetatable({}, Apple),
setmetatable({}, Apple),
setmetatable({}, Apple),
},
oranges={
setmetatable({}, Orange),
setmetatable({}, Orange),
},
}I dunno if that's too much work to put into this. It can be done already in 2 ways:
- post-process after
decoderuns - post-process during revival of the element who's keys are
"apples"and"oranges", by looping over it's children
Also, my example is not a great data structure. a tagged one like TypedJSON generates is more "proper" json
There was a problem hiding this comment.
I didn't add a path tracking system because it can be done pretty trivially by having the replacer / reviver close over a tracking table that the replacers and revivers themselves update as they visit things, and using parent to figure out what path key and value are relative to. That's a pretty reasonable use-case though.
There was a problem hiding this comment.
I'm failing to think of a way to accomplish the above example without a for-loop, especially for revival, where the leaves are visited before the parents, so you can't really know the leaf address while it's being visited
There was a problem hiding this comment.
Hmmm, yeah I forgot about visitation order being depth-first, leaf-first for the reviver....
I'm kind of torn here. Most (de)serializers don't provide a path and need a post-processing step for cases where position within the JSON document determines how something should be treated rather than some property of the data itself. It's not an unreasonable use-case though, so I'll see how complicated it would be to add optional path tracking, so it doesn't impact the common case where the bookkeeping overhead would be unnecessary.
| } else if (keytype == LUA_TSTRING) { | ||
| json_append_string(l, json, -2); | ||
| strbuf_append_char(json, ':'); | ||
| } else { |
There was a problem hiding this comment.
Could it convert uuid's as string?
uuid's are often used as keys in tables of visitors, subscribers, tippers, buyers... which can be exported to external resources.
It could also convert anything to string, as slencode() does, but it will not be so useful.
There was a problem hiding this comment.
a replacer/reviver could convert uuid's to something else. I'm quite happy with the default behavior of lljson with uuid's now, and replacers/revivers allow further easy customization:
string to uuid:
> us = "12345678-9012-3456-7890-123456789012"
> lljson.slencode({1, us, 2}, {tight=true})
[1,"12345678-9012-3456-7890-123456789012",2]
> lljson.slencode({1, us, 2}, {replacer=function(k, v) return touuid(v) or v end, tight=true})
[1,"!uEjRWeJASNFZ4kBI0VniQEg",2]uuid to string:
> u = uuid(us)
> lljson.slencode({1, u, 2}, {tight=true})
[1,"!uEjRWeJASNFZ4kBI0VniQEg",2]
> lljson.slencode({1, u, 2}, {replacer=function(k, v) return if typeof(v) == "uuid" then tostring(v) else v end, tight=true})
[1,"12345678-9012-3456-7890-123456789012",2]There was a problem hiding this comment.
Converting UUIDs seems reasonable to me!
There was a problem hiding this comment.
what does "converting uuid's" mean? make one or both of these return "uuid" rather than "string"?
> print(typeof(lljson.decode('"12345678-9012-3456-7890-123456789012"')))
string
> print(typeof(lljson.sldecode('"12345678-9012-3456-7890-123456789012"')))
stringwithout a reviver? I don't have a strong objection to it for lljson.decode. But I'd rather lljson.sldecode did not do that. If I put a uuid as a string into lljson.slencode, I had to intentionally do that and probably had a good reason too. Not that I can think why I'd do that.
Still, it's a really easy reviver to write. I don't feel like much is gained nor lost either way:
> j = '[4, "12345678-9012-3456-7890-123456789012", "not uuid"]'
> function reviver(k, v) return touuid(v) or v end
> for k, v in lljson.decode(j, reviver) do print(k, typeof(v), v) end
1 number 4
2 uuid 12345678-9012-3456-7890-123456789012
3 string not uuidThere was a problem hiding this comment.
My request is only for encode() to convert uuid keys to string, instead of throwing a "table key must be a number or string" error.
It would be useful to export, for external storage, tables of visitors, subscribers, tippers, buyers, etc., that have the userId as the key.
replacer doesn't change individual keys, only their values. Currently, we need to provide a new table with __tojson or replacer, which is a memory problem since these tables can be big.
I'm happy with how slencode() / sldecode() deal with uuid, and with decode() decoding the JSON types as they are without trying to "interpret" them.
There was a problem hiding this comment.
oh. keys:
> lljson.encode({[uuid("12345678-9012-3456-7890-123456789012")]= "yo"})
stdin:1: Cannot serialise userdata: table key must be a number or string
stack backtrace:
[C] function encode
stdin:1
Yeah. that would be great. no objection
| -- by setting the array_mt metatable | ||
| assert(lljson.encode(setmetatable({}, lljson.array_mt)) == "[]") | ||
| -- `lljson.empty_array` is the same. | ||
| -- Sentinel frozen tables for empty array/object |
There was a problem hiding this comment.
At l8, now you need the opposite test - you can make it an object.
| if type(value) == "table" then | ||
| local mt = getmetatable(value) | ||
| if mt then | ||
| -- Call __tojson manually on the value, if we have one. |
There was a problem hiding this comment.
Does it call __tojson also with tj:encode(myTab, { skip_tojson = true }) ?
There was a problem hiding this comment.
I don't think harold implemented encode options in this TypedJSON wrapper:
> TypedJSON = require("./TypedJSON"); Vec2 = require("./Vec2"); Player = require("./Player"); Color = require("./Color")
> tj = TypedJSON.new({ Vec2 = Vec2, Player = Player, [3] = Color })
> tojson = { __tojson = function(self, ...) print("tojson", self, ...); return self end }
> lljson.encode(setmetatable({}, tojson))
tojson table: 0x0000000039798a00 table: 0x00000000397989c8
[]
> lljson.encode(setmetatable({}, tojson), {skip_tojson=true})
[]
> lljson.slencode(setmetatable({}, tojson))
tojson table: 0x0000000039854e88 table: 0x0000000039854e50
[]
> lljson.slencode(setmetatable({}, tojson), {skip_tojson=true})
[]
> tj:encode(setmetatable({}, tojson))
tojson table: 0x0000000039854cc8
[]
> tj:encode(setmetatable({}, tojson), {skip_tojson=true})
tojson table: 0x0000000039854be8
[]
> tj:slencode(setmetatable({}, tojson))
tojson table: 0x0000000039854b08
[]
> tj:slencode(setmetatable({}, tojson), {skip_tojson=true})
tojson table: 0x0000000039854a28
[]
It doesn't even take a second argument: https://github.com/secondlife/slua/pull/61/changes/BASE..222e486a4b6a53c70d164ee6ec2f6c25e701b189#diff-63bfb54906b6f05adc459307c72ee6eef607d41c6674ccd28007eb75a1769e8dR137 . I guess it could have taken a second argument
There was a problem hiding this comment.
Indeed, skip_json is (ab)used by TypedJSON strictly to allow it to call __tojson itself in the replacer, not to actually skip calling __tojson. TypedJSON really is just a trivial example of how you could write such a thing to prove that the API is workable for common serialization situations, it's not necessarily intended to be used by anything.
| static void json_next_number_token(json_parse_t *json, json_token_t *token) | ||
| { | ||
| char *endptr; | ||
| long long tmpval = strtoll(json->ptr, &endptr, 10); |
There was a problem hiding this comment.
This is not related to the current improvement, it happens in the version in-world:
print(lljson.decode("100000000000000000000")) -- 1e20
-- > 9223372036854776000There was a problem hiding this comment.
still happens:
> lljson.decode("100000000000000000000")
9223372036854776000
and it doesn't seem the native integer cuttoff is quite that low:
> 100000000000000000000
100000000000000000000
> lljson.decode("100000000000000000000") == 100000000000000000000
false
| @@ -1 +1 @@ | |||
| /* Lua CJSON - JSON support for Lua | |||
There was a problem hiding this comment.
I'm learning how it works, is it like this? in the most metatably case?
table1
if replacer then
table2 = table1.__tojson
table3 = replacer
end
shape = table3.__jsontype
table4 = table3.__tojson
if array then
len = table4.__len
end
- A table can be replaced 3 times (
__tojsoncalled in replacer, the replacer and__tojsonin serialization). - If replacer returns the same table, its
__tojsonis called twice (in replacer and in serialization). - In serialization,
__tojsonreturning a table with__jsontypehas no effect.
There was a problem hiding this comment.
it looks like you are correct. __tojson can be called both before and after the replacer:
> tojson = { __tojson = function(self, ...) print("tojson", self, ...); return self end }
> function replacer(...) print("replacer", ...); return setmetatable({}, tojson) end
> lljson.encode(setmetatable({}, tojson), {replacer=replacer})
tojson table: 0x00000000398548a0 table: 0x0000000039854830
replacer nil table: 0x00000000398548a0 nil
tojson table: 0x0000000039854788 table: 0x0000000039854830
[]
There was a problem hiding this comment.
it looks like __jsontype is indeed ignored on tables returned from __tojson:
> lljson.encode(setmetatable({}, {__jsontype="object"}))
{}
> lljson.encode(setmetatable({}, {__tojson = function() return setmetatable({}, {__jsontype="object"}) end }))
[]
There was a problem hiding this comment.
Correct, __jsontype is what determines the encoding semantic for the object, __tojson() is a hook that objects can use to present their own view of the data to the serializer. Will look at that double __tojson call though.
There was a problem hiding this comment.
Regarding __jsontype and __tojson, a theoretical (and probably non-existent case) would be a __tojson that returns, depending on the data, any one of:
- a JSON object
- a JSON array
- an SLua array to be encoded as a JSON object
Then, we would need to set __jsontype in __tojson and use a replacer that does nothing ( replacer = function(_, v) return v end ) to get __tojson called and __jsontype set before it is used.
| char *end; | ||
| double num = fpconv_strtod(payload, &end); | ||
| if (end == payload) | ||
| luaL_error(l, "malformed tagged float: %s", str); |
There was a problem hiding this comment.
I feel like this should give an error, but doesn't:
> lljson.sldecode('"!f 000000000000000003.14$$$$$$$$"')
3.14
> tonumber(" 000000000000000003.14$$$$$$$$")
nil
> tonumber(" 000000000000000003.14")
3.14
| lua_pushvalue(l, -2); // self | ||
| lua_pushvalue(l, (int)EncodeStack::CTX); // ctx | ||
| YIELD_CHECK(l, TOJSON_CHECK, LUA_INTERRUPT_LLLIB); | ||
| YIELD_CALL(l, 2, 1, TOJSON_CALL); |
There was a problem hiding this comment.
I'm wondering what happens if __tojson returns a table that also has a __tojson.
Should the new __tojson be called too, until no more __tojson are returned?
Otherwise, unless replacer replaces the table again, will the changes made by replacer be overwritten when __tojson is called later?
There was a problem hiding this comment.
My __tojson example above returns self, which has a __tojson. and it only ran __tojson once
Of note, there's now a notion of replacer and reviver callbacks as in JS'
JSONAPIs. An example of their use is in the tests folder aslljson_typedjson.lua.We went with this form since it allows constructing a different representation of the object before serializing without requiring you to construct an entire, serializable copy before calling
lljson.encode(). That allows you to save memory, since the serializable version of each object only need to be alive as long as we're still traversing the object.Additionally, an empty table is now encoded as
[]by default. This is probably the most common meaning for an empty table, but you can also applyobject_mtas a metatable or add__jsontype="object"to your own metatable to force serialization as an object. Similarly,array_mtor__jsontype="array"will act as a hint to treat your object as an array.__lenshould no longer be used as a hint that the object should be treated as an array, that's what__jsontypeis for.Also added a new options table format to
lljson.encode()and friends. The table now allows you to specify that__tojsonhooks should be skipped, so you can manually invoke them at your leisure in your replacer hooks.