Skip to content

Conversation

@bnoordhuis
Copy link
Contributor

Refs: #992


I'm kind of starting to hate that whole TwoByte name scheme. @chqrlie suggested 16Bits, I'm thinking of renaming to UCS myself. Thoughts?

@Sytten
Copy link

Sytten commented Jan 2, 2026

TwoByte is not ideal agreed, I like USC but does that imply some guarantees for missing surrogate pairs?
That's mainly the issue we have with the current utf8 conversion.
The best test case would be "🌍🌎🌎".slice(1), we should have a way to retrieve the bytes 0d df 3c d8 0e df 3c d8 0f df
If I remember you guys do "lazy" utf16 too right?

@bnoordhuis
Copy link
Contributor Author

The failing test was this (on meson on ubuntu-latest (default debug, mimalloc)):

 /home/runner/work/quickjs/quickjs/build-debug/../tests/test_std.js:253: Error: assertion failed: got |0|, expected |15|

That's the assert(status & 0x7f, os.SIGTERM) that's also flaky under cygwin.

@bnoordhuis
Copy link
Contributor Author

I like USC but does that imply some guarantees for missing surrogate pairs?

Nothing that JS strings don't already promise (i.e., not much.)

Copy link
Contributor

@saghul saghul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to UCS

@chqrlie
Copy link
Collaborator

chqrlie commented Jan 3, 2026

Why not simply JS_ToCStringUTF16, more explicit and self explanatory ?

Or possibly JS_ToWCString with obvious and less obvious connotations...

@bnoordhuis
Copy link
Contributor Author

Why not simply JS_ToCStringUTF16, more explicit and self explanatory ?

Unmatched surrogate pairs. JS strings that have them are not legal UTF-16. That was the reason for the TwoByte moniker, to convey to users they can't assume well-formed UTF-16.

Or possibly JS_ToWCString with obvious and less obvious connotations...

To seasoned C programmers that name suggests it's using wchar_t when it isn't. There's probably a special circle in Hell for programmers who commit such crimes against humanity. I don't know about you but I'd like to go to Heaven.

Half tongue in cheek (half, because I do like to go to Heaven); wdyt, @saghul?

@saghul
Copy link
Contributor

saghul commented Jan 4, 2026

🤣

I think UTF16 with docs or UCS as the part in the name works for me.

@bnoordhuis
Copy link
Contributor Author

Okay, here's the plan: I'm going to merge this PR and then open a new one that s/TwoByte/UTF16/g

@bnoordhuis bnoordhuis merged commit 1d47486 into quickjs-ng:master Jan 4, 2026
233 of 234 checks passed
@bnoordhuis bnoordhuis deleted the two-byte-strings branch January 4, 2026 09:40
bnoordhuis added a commit to bnoordhuis/quickjs that referenced this pull request Jan 4, 2026
bnoordhuis added a commit that referenced this pull request Jan 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants