Bug 256716 - DOMXPath- failing (fn-lang.html) due to U+212A handling
Summary: DOMXPath- failing (fn-lang.html) due to U+212A handling
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: XML (show other bugs)
Version: Safari Technology Preview
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: BrowserCompat, InRadar, WPTImpact
Depends on:
Blocks:
 
Reported: 2023-05-12 10:02 PDT by Ahmad Saleem
Modified: 2024-03-29 16:36 PDT (History)
8 users (show)

See Also:


Attachments
CSS selector test (137 bytes, text/html)
2023-05-14 12:05 PDT, Alexey Proskuryakov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ahmad Saleem 2023-05-12 10:02:25 PDT
Hi Team,

While going through WPT test cases, came across one sub-test failure:

Test Case - https://wpt.fyi/results/domxpath/fn-lang.html?label=experimental&label=master&aligned

Link - http://wpt.live/domxpath/fn-lang.html

___________

Researching bit on when this was added and against which bugs, it seems to be U+212A handling issue either in CSS or DOM level.

Just raising, so we can fix it.

Thanks!
Comment 1 Alexey Proskuryakov 2023-05-14 12:05:37 PDT
// U+212A should match to ASCII 'k'.
// XPath 1.0 says:
// ... such that the attribute value is equal to the argument ignoring that suffix
// of the attribute value and ignoring case.
// XPath 3.1 says:
// ... true if and only if, based on a caseless default match as specified in
// section 3.13 of The Unicode Standard,
testFirstChild('lang("ko")', '<root><match xml:lang="&#x212A;o"/></root>');

-------

So this test says that U+212A KELVIN SIGN should match letter "k" in lang attribute, which seems weird, at least initially.

The Unicode Standard reference here is obsolete, but is almost certainly about "caseless matching" in https://www.unicode.org/versions/Unicode15.0.0/ch05.pdf. Which indeed has this behavior, but the rest of the Web platform does not AFAIK. Notably, even Chrome doesn't have this behavior for CSS selector matching.

I think that this is a bug in the standard, and Chrome is internally inconsistent, so we shouldn't be following it. CC'ing some folks who've done more recent work in this area to weigh in.
Comment 2 Alexey Proskuryakov 2023-05-14 12:05:56 PDT
Created attachment 466349 [details]
CSS selector test
Comment 3 Anne van Kesteren 2023-05-14 23:47:22 PDT
Looking at the source of the test it seems this was done based on an XPath 3.1 clarification, but indeed we use ASCII case-insensitive for language matching across the web platform. Let's change this test and add a clarification to DOM/HTML with regards to what kind of matching the XPath lang() function uses.
Comment 4 Radar WebKit Bug Importer 2023-05-19 10:03:20 PDT
<rdar://problem/109571692>
Comment 5 Ahmad Saleem 2024-03-29 16:36:09 PDT
Modifying the test upstream in WPT - https://github.com/web-platform-tests/wpt/pull/45436