Bug 254889 - Support all of HTML's character entities in WebVTT
Summary: Support all of HTML's character entities in WebVTT
Status: RESOLVED DUPLICATE of bug 176225
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: Safari Technology Preview
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL: http://wpt.live/webvtt/parsing/cue-te...
Keywords: WPTImpact
Depends on:
Blocks:
 
Reported: 2023-04-02 08:15 PDT by Ahmad Saleem
Modified: 2023-04-02 18:32 PDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ahmad Saleem 2023-04-02 08:15:32 PDT
Hi Team,

While going through Blink's commits, I came across another one, which can be explored in WebKit.

Blink Commit - https://chromium.googlesource.com/chromium/src.git/+/80ccfaf557f5ad07e5de8bcc08e1aba84190b2a0

WPT Test Link - http://wpt.live/webvtt/parsing/cue-text-parsing/tests/entities.html

Just wanted to raise so we can track it.

Thanks!

____

@ap - if you can help, who should be informed on this and CC, it would be good to know for myself as well on who looks into WebVTT in WebKit.
Comment 1 Alexey Proskuryakov 2023-04-02 18:12:53 PDT

*** This bug has been marked as a duplicate of bug 176225 ***
Comment 2 Karl Dubost 2023-04-02 18:32:36 PDT
Ahmad, 

Darin seems to have been the "recent" (2015) editor of this piece of code 
https://searchfox.org/wubkat/rev/64453e226bbd56f49b248f0f8816a72e5547e456/Source/WebCore/html/track/WebVTTTokenizer.cpp#120

Latest improvements about HTML Tokenization was done 
in Bug 140166

The spec is not obviously clear about it. Here's an example which shows yes HTML entities are possible. 
https://www.w3.org/TR/webvtt1/#example-4a66a3ef

> To change that line to left-to-right base direction, start the line with an U+200E LEFT-TO-RIGHT MARK character (it can be escaped as "‎").

but it's an example.

The test in 
http://wpt.live/webvtt/parsing/cue-text-parsing/tests/entities.html
https://wpt.fyi/results/webvtt/parsing/cue-text-parsing/tests/entities.html?label=master&label=experimental&aligned

it also shows Firefox failing the same test.

Let's find out the commit for the test, maybe there is more information. 
https://github.com/web-platform-tests/wpt/commit/3c01711d2b0dffe60bea034340a83a40dbf17cc1

ha yes it's in the spec. I was looking for HTML entities instead of HTML Character reference. 

> HTML character reference in data state
> Attempt to consume an HTML character reference, with no additional allowed character.
> 
> If nothing is returned, append a U+0026 AMPERSAND character (&) to result.
> 
> Otherwise, append the data of the character tokens that were returned to result.
> 
> Then, in any case, set tokenizer state to the WebVTT data state, and jump to the step labeled next.