Commit 1f3fe93
authored
Add GB18030-2022 to default encoding list for zh-CN (#20604)
GB18030-2022 is the current official standard, superseding the previous 2005 and 2000 versions. It is essential for modern Chinese text processing for the following reasons:
1. Superset Relationship: GB18030 is a strict superset of CP936 (GBK) and EUC-CN (GB2312). Using GB18030 as the detection target covers all characters in these older encodings while enabling support for a much wider range of characters.
2. Extended Character Coverage: The 2022 standard includes significant updates, covering over 87,000 characters. It adds support for CJK Extensions (C, D, E, F, G) and updates mappings for rare characters that were previously mapped to the Private Use Area (PUA) in the 2005 version. This is critical for correctly handling names containing rare characters (e.g., in banking or government data).
3. Backward Compatibility: It is safe to promote GB18030-2022 as the preferred encoding. Files encoded in EUC-CN or CP936 are valid GB18030 streams.
This PR adds GB18030-2022 to the default encoding list for CN.1 parent 19deb91 commit 1f3fe93
File tree
4 files changed
+30
-1
lines changed- ext/mbstring
- tests
4 files changed
+30
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
67 | 70 | | |
68 | 71 | | |
69 | 72 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
119 | | - | |
| 119 | + | |
| 120 | + | |
120 | 121 | | |
121 | 122 | | |
122 | 123 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
0 commit comments