Bug 256825

Summary: [SOUP] HTML pages with broken or missing content type not displayed as HTML
Product: WebKit Reporter: Guilaume Ayoub <guillaume.webkit>
Component: WebKitGTKAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: bugs-noreply, karlcow, mcatanzaro, piolet.y
Priority: P2    
Version: Other   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
browser and devtools for France inter none

Description Guilaume Ayoub 2023-05-15 20:55:25 PDT
Some websites from Radio France are not displayed as HTML, but as plain text files. They work on other browsers (tested with Chrome and Firefox).

They used to work with the same version of WebKitGTK, something has probably changed on the server. Maybe there’s something wrong with the content-type header (text/html, text/html; charset=UTF-8)?

For example: https://www.radiofrance.fr/franceinter
Comment 1 Karl Dubost 2023-05-15 22:30:35 PDT
Guillaume what was your userAgent string?
Comment 2 Guilaume Ayoub 2023-05-15 22:39:19 PDT
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.4 Safari/605.1.15
Comment 3 Karl Dubost 2023-05-15 23:01:52 PDT
Created attachment 466359 [details]
browser and devtools for France inter

Pas de chance. 

I can't reproduce with Guillaume's provided UA string on a macOS MacBook. 
No extensions. clean profile.
Comment 4 Guilaume Ayoub 2023-05-16 00:25:41 PDT
(In reply to Karl Dubost from comment #3)
> Created attachment 466359 [details]
> browser and devtools for France inter
> 
> Pas de chance. 

:)

> I can't reproduce with Guillaume's provided UA string on a macOS MacBook. 
> No extensions. clean profile.

I have the problem with Epiphany.

Note that for me, in the web inspector’s network tag, the page’s mimetype is "text/plain" (but the content is HTML).
Comment 5 Michael Catanzaro 2023-05-16 05:35:30 PDT
(In reply to Guilaume Ayoub from comment #0)
> Maybe there’s something wrong with the content-type
> header (text/html, text/html; charset=UTF-8)?

Seems like a very good guess. That looks pretty messed up. I'm not sure what we should do about it though. I guess we could fall back to text/html if the page does not have any valid content type, but I'm not sure if that's actually safe to do. Depends on how other browsers behave. We don't want to start processing text files as HTML, for example.

Another page with a similar problem is https://doctors.bjc.org/wlp2/bjc/doctors/search but this one apparently just doesn't have any content type header at all.
Comment 6 Karl Dubost 2023-05-16 05:44:10 PDT
Content-Type: text/html, text/html; charset=UTF-8

First of all, they should be contacted. 
Instead of a fallback, maybe it's possible to do a Quirk for this specific case. 
I just realized that Safari is receiving the same thing. 

This is a typical bug for https://webcompat.com/

Firefox is also receiving the same bogus Content-Type. So it's at least not based on User-Agent sniffing.

If the bug is opened on webcompat.com aka (https://github.com/webcompat/web-bugs/)
it will be easier to contact people on radiofrance
https://github.com/orgs/radiofrance/people

Let's try first on Mastodon. I will send a message.
Comment 7 Guilaume Ayoub 2023-05-16 05:53:18 PDT
(In reply to Karl Dubost from comment #6)
> Let's try first on Mastodon. I will send a message.

OK. Thanks!

> If the bug is opened on webcompat.com aka
> (https://github.com/webcompat/web-bugs/)
> it will be easier to contact people on radiofrance
> https://github.com/orgs/radiofrance/people

Can I do this now, or should we wait for an answer on Mastodon?
Comment 8 Karl Dubost 2023-05-16 05:56:09 PDT
First attempt
https://mastodon.cloud/@Karlcow/110378453711258132
Comment 9 Karl Dubost 2023-05-16 05:56:49 PDT
It can still be opened on webcompat.com And I can continue there to contact radiofrance.
Comment 10 Guilaume Ayoub 2023-05-16 06:03:57 PDT
(In reply to Karl Dubost from comment #9)
> It can still be opened on webcompat.com And I can continue there to contact
> radiofrance.

Reported here: https://github.com/webcompat/web-bugs/issues/122352
Comment 11 Youenn Piolet 2023-05-16 08:42:29 PDT
Salut Karl, salut Guillaume,

Thanks for the notification, the dev teams have been made aware and we can reproduce.
I'm on holiday but I think a fix is on it's way :)

Cheers,
Comment 12 Youenn Piolet 2023-05-16 09:10:31 PDT
I should be fixed now :)
Thanks again for the notification
Comment 13 Michael Catanzaro 2023-05-16 10:08:09 PDT
Thanks Youenn!

I'd like to keep this bug open though, since to be web compatible we'll need to figure out what to do about other pages with this problem. I've retitled the bug to reflect that Radio France is no longer broken.