Bug 252269 - Match upstream WPT semantics for testharness tests
Summary: Match upstream WPT semantics for testharness tests
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on: 161693 252268 161310
Blocks:
  Show dependency treegraph
 
Reported: 2023-02-14 15:09 PST by Sam Sneddon [:gsnedders]
Modified: 2023-02-15 13:13 PST (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sam Sneddon [:gsnedders] 2023-02-14 15:09:54 PST
We know that a disproportionate amount of flakiness in LayoutTests comes from WPT; some degree of this is simply going to be the rate at which tests are added there versus the rest of LayoutTests.

But we also break the testharness API guarantees by expecting more from them by default: we expect what we log in the console to be constant (bug 161310), we expect assertion messages to be constant (bug 161693), and we expect the order of subtests to be constant (bug 252268).
Comment 1 Radar WebKit Bug Importer 2023-02-14 15:11:16 PST
<rdar://problem/105469744>
Comment 2 Sam Sneddon [:gsnedders] 2023-02-14 15:18:24 PST
From bug 161693, though all of these really relevant to the wider issue:

(In reply to Darin Adler from comment #5)
> I’m not going to say review+ because I am not sure this is the best solution.
> 
> The cost of this is that when a test does fail, we won’t be able to see
> which assertion failed. And if we turn the real failures back on then we
> will get failures because of expected files that expect the failure log to
> be disabled.
> 
> So is that cost worth the benefit of less flakiness? Is there some other way
> of achieving that goal? Maybe we can adjust things so that the code that
> compares the results with expected results can understand what may vary?

(In reply to Chris Dumez from comment #7)
> r=me with comments. For the record, I do think the best solution would be to
> avoid such flaky asserts upstream in web-platform-tests. However, based on
> feedback, I think this is unlikely to happen. As a result, I think this
> patch is the second best option and we can do this easily and quickly. This
> is MUCH better than skipping tests.

(In reply to Alex Christensen from comment #8)
> I think ideally we wouldn't need this because all the tests would always
> have deterministic output.  Since that isn't the case, we need something
> like this.  We should bring it up when we discuss this in a few weeks.
Comment 3 Sam Sneddon [:gsnedders] 2023-02-15 13:13:34 PST
When it comes to flakiness, there's also the extreme solution that Chromium took:

If the harness status is OK, and all subtests are PASS, and there's no console output, don't bother with an expectation file whatsoever and just treat that test as passing.