Bug 260938

Summary: Create a helper for targeting domain names in Quirks.cpp
Product: WebKit Reporter: Karl Dubost <karlcow>
Component: WebCore Misc.Assignee: Karl Dubost <karlcow>
Status: RESOLVED FIXED    
Severity: Normal CC: webkit-bug-importer, zalan
Priority: P2 Keywords: InRadar
Version: Safari 17   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=267623
https://bugs.webkit.org/show_bug.cgi?id=276709
Bug Depends on:    
Bug Blocks: 258603    

Description Karl Dubost 2023-08-30 19:11:24 PDT
Currently the Quirks.cpp is targeting domain names in a lot of different ways, around 16 ways more or less equivalent. 
Probably not everything would benefit from a helper function but probably a big chunk of them. 
The goal being able to pass domain name string and return true, when the Document is matching this string.
Comment 1 Radar WebKit Bug Importer 2023-08-30 19:12:03 PDT
<rdar://problem/114737751>
Comment 2 Karl Dubost 2023-08-30 19:12:59 PDT
some of the current patterns

// 1
// match example.com OR finish by .example.com
auto host = m_document->url().host();
return equalLettersIgnoringASCIICase(host, "example.com"_s) || host.endsWithIgnoringASCIICase(".example.com"_s);


// 2
// match example.com only but with a topDocument()?
auto& url = m_document->topDocument().url();
auto host = url.host();
if (equalLettersIgnoringASCIICase(host, "example.com"_s))
    return true;


// 3
// same as above, just organizing differently
auto& url = m_document->topDocument().url();
return equalLettersIgnoringASCIICase(url.host(), "example.com"_s);


// 4
// same, different wrapping
return equalLettersIgnoringASCIICase(m_document->topDocument().url().host(), "example.com"_s


// 5
// testing only for *.example.com
auto host = m_document->topDocument().url().host();
return host.endsWithIgnoringASCIICase(".example.com"_s);


// 6
// not using comparison function, but just strict equality with strings
// so I guess fails if different case, such as ExAmple.com
auto& topDocument = m_document->topDocument();
auto host = topDocument.url().host();
auto isExample = host.endsWith(".example.com"_s) || host == "example.com"_s;


// 7
// converted to lowercase, then string comparison AND a path
auto& url = m_document->topDocument().url();
auto host = url.host().convertToASCIILowercase();
if (host == "example.com"_s || host.endsWith(".example.com"_s)) {
    return startsWithLettersIgnoringASCIICase(url.path(), "/somewhere/"_s)
}

// 8
// using the quirk function name 
// and a domain function to match on something.example.*
if (!m_quirkName)
    m_quirkName = isDomain(*m_document);
return m_quirkName.value();
// with  isDomain()
static inline bool isDomain(Document& document)
{
    auto host = document.topDocument().url().host();
    return startsWithLettersIgnoringASCIICase(host, "something."_s) && topPrivatelyControlledDomain(host.toString()).startsWith("example."_s);
}

// 9 
// another case of matching a domain function
static bool isExampleDomain(const URL& url)
{
    static NeverDestroyed exampleDomain = RegistrableDomain { URL { "https://example.com"_s } };
    return exampleDomain->matches(url);
}


// 10
// RegistrableDomain and using the function name
if (m_quirkName)
    return m_quirkName.value();
auto domain = RegistrableDomain(m_document->url()).string();
m_quirkName = domain == "example.com"_s;
return m_quirkName.value();

// 11
// Same, just another variation
auto topURL = m_document->topDocument().url();
auto host = topURL.host();
RegistrableDomain registrableDomain { topURL };
if (registrableDomain == "example.com"_s) {}


// 12
// using topPrivatelyControlledDomain and then simple function
// https://searchfox.org/wubkat/source/Source/WebCore/platform/soup/PublicSuffixSoup.cpp#63-103
topPrivatelyControlledDomain(m_document->topDocument().url().host().toString()).startsWith("example."_s)


// 13
// using topPrivatelyControlledDomain AND a path
auto& url = m_document->topDocument().url();
return topPrivatelyControlledDomain(url.host().toString()).startsWith("example."_s) && startsWithLettersIgnoringASCIICase(url.path(), "/somewhere/"_s);

// 14
// another variation on 11. with paranthesis around the domain matching
auto domain = RegistrableDomain { m_document->topDocument().url() };
m_quirkName = (domain == "example.com"_s);


// 15
// using securityOrigin()
auto host = m_document->securityOrigin().host();
m_quirkName = host == "www.example.com"_s 

// 16 
// variation on 15
auto domain = m_document->securityOrigin().domain().convertToASCIILowercase();
m_quirkName = domain == "example.com"_s || domain.endsWith(".example.com"_s);
Comment 3 Karl Dubost 2023-08-30 19:18:44 PDT
We probably need to be extra-careful on why some Quirks chose to use

* m_document->topDocument().url().host()
* m_document->url().host()
* m_document->securityOrigin().host()
* m_document->securityOrigin().domain()
* RegistrableDomain(m_document->url()).string()
* topPrivatelyControlledDomain(url.host().toString())
Comment 5 Karl Dubost 2023-08-31 23:05:36 PDT
Pull request: https://github.com/WebKit/WebKit/pull/17329
Comment 6 EWS 2023-09-12 10:25:30 PDT
Committed 267907@main (2d5d6f169a10): <https://commits.webkit.org/267907@main>

Reviewed commits have been landed. Closing PR #17329 and removing active labels.
Comment 7 Karl Dubost 2023-11-23 14:39:59 PST
*** Bug 211607 has been marked as a duplicate of this bug. ***