Bug 258251

Summary: REGRESSION(264918@main): GB18030 encoding isn't hooked up correctly
Product: WebKit Reporter: Myles C. Maxfield <mmaxfield>
Component: TextAssignee: Alex Christensen <achristensen>
Status: RESOLVED FIXED    
Severity: Normal CC: mmaxfield, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
encoder observation
none
decoder observation none

Description Myles C. Maxfield 2023-06-17 16:04:59 PDT
We're encoding U+E78D to 0x83 0x36 0xCB 0x32 which seems totally wrong. That is neither 0xA6 0xD9 nor 0x84 0x31 0x82 0x36 (which are the sequences on L2/23-003R[1]). If we round-trip our byte sequence back to a code point, it decodes to U+E82E which is just a totally different PUA character.

[1] https://www.unicode.org/L2/L2023/23003r-gb18030-recommendations.pdf
Comment 1 Radar WebKit Bug Importer 2023-06-17 16:05:16 PDT
<rdar://problem/110952885>
Comment 2 Myles C. Maxfield 2023-06-17 16:06:41 PDT
Created attachment 466741 [details]
encoder observation
Comment 3 Myles C. Maxfield 2023-06-17 16:06:50 PDT
Created attachment 466742 [details]
decoder observation
Comment 4 Alex Christensen 2023-06-29 10:32:28 PDT
Pull request: https://github.com/WebKit/WebKit/pull/15413
Comment 5 EWS 2023-06-29 15:38:17 PDT
Committed 265633@main (0a2e4563bda0): <https://commits.webkit.org/265633@main>

Reviewed commits have been landed. Closing PR #15413 and removing active labels.