| Summary: | REGRESSION (iOS 15.4): Poor game render performance with "WebGL via Metal" | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | WebKit | Reporter: | Ocean <41802399> | ||||||||
| Component: | WebGL | Assignee: | Nobody <webkit-unassigned> | ||||||||
| Status: | NEW --- | ||||||||||
| Severity: | Major | CC: | dino, jonahr, karlcow, kbr, kkinnunen, kpiddington, paulrhomberg01, sihui_liu, webkit-bug-importer | ||||||||
| Priority: | P2 | Keywords: | InRadar | ||||||||
| Version: | Safari 16 | ||||||||||
| Hardware: | iPhone / iPad | ||||||||||
| OS: | iOS 16 | ||||||||||
| Bug Depends on: | 256087, 256088, 254912, 255860 | ||||||||||
| Bug Blocks: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Ocean
2023-04-26 08:04:35 PDT
Created attachment 466097 [details]
macOS comparison of webgl via metal
Created attachment 466099 [details]
iOS_comparison_webgl_via_metal
iOS comparison of webgl_via_metal
I can see a performance difference running in Chromium also: ~105 FPS on ANGLE/Metal and ~120 FPS on ANGLE/OpenGL. A profile of Chromium's GPU process shows a lot of time spent in waiting for the backpressure fence. If similar behavior's seen on WebKit then this might be a duplicate of Bug 254912. Thank you for the report. Bug 254912, bug 255860 have improved Safari behavior with this content better. However, the OpenGL backend seems to still be quite much better. Added some blocking bugs to be solved for this. The content is essentially GPU bound on Metal when run on big screens, but is not bound when run on OpenGL. With the above fixes on some window sizes MBP M1 Max with Metal runs 60 fps. However, with window sizes such as 5120x2880, the Metal rate is 30fps. OpenGL is able to run at full rate. These numbers are also consistent with current Chrome Canary running 5120x2880, Metal ~30fps, partial screens at full rate 120fps and OpenGL consistently 120fps. Thanks for the quick reply, since we have a large number of games that will run on iOS wkwebview, we are very concerned about the situation on the mobile side. Is the reason for the performance difference (iPhone13 fps 32 vs 60) of turning on metal on iOS safari the same as that on the desktop? Do these optimizations also work on iOS safari and how long it will take for these optimizations to publish fixes on the iOS side? Thanks! Hello, thanks for the detailed report! I've begun taking a look at your shaders. One thing that jumps out to me from the start is that the shaders in this sample declare a lot of global variables. Here's a snippet from the declaration of one of the fragment shaders.
#version 100
precision highp float;
precision highp int;
uniform vec4 _MainLightPosition;
... /* rest of uniforms are declared*/
/* Global variables */
vec4 u_xlat0;
bool u_xlatb0;
vec4 u_xlat1;
vec3 u_xlat2;
vec3 u_xlat3;
vec3 u_xlat4;
float u_xlat5;
vec4 u_xlat6;
bool u_xlatb6;
vec2 u_xlat7;
vec3 u_xlat8;
vec3 u_xlat9;
vec3 u_xlat10;
vec3 u_xlat11;
vec3 u_xlat12;
vec3 u_xlat14;
float u_xlat18;
vec3 u_xlat20;
float u_xlat31;
bool u_xlatb31;
float u_xlat39;
bool u_xlatb39;
float u_xlat41;
int u_xlati41;
float u_xlat42;
int u_xlati42;
float u_xlat43;
int u_xlati43;
bool u_xlatb43;
float u_xlat44;
bool u_xlatb44;
float u_xlat45;
...
/* Other helper functions*/
void main()
{
...
}
In the shaders this demo runs, the global variables are only used in 'main', which makes me wonder if we can optimize this case a bit. Non-constant Global variables in the Metal backend are expensive, and tend to add a lot of memory pressure to a shader. We assume, perhaps naively, that they could be modified in any function at any time.
As a potential solution, we could write a translator pass for ANGLE that could lower a global variable used in one function to a local variable. While we continue to investigate this performance hit, can you tell us more about these global variables? Are these generated by Unity, or part of your own custom shader code?
Thanks again for the reproduction and your quick responses!
This is a demo of Unity's official rendering pipeline (named URP). All shaders are built-in and generated by Unity. I try to consult the role of these global variables on the engine side Created attachment 466513 [details]
GPU fragment time
Why does the Webkit.GPU process occupy the fragment for so long? Is it related to shader complexity or read/write bandwidth? We found that these two performance data of safari's WebGL application are much higher than those of APP.
Phones run this kind of WebGL to render webpages and get hot after a few minutes.
Submitted https://chromium-review.googlesource.com/c/angle/angle/+/4771215 in ANGLE which should hopefully fix this issue once it lands downstream. (In reply to Ocean from comment #9) > Created attachment 466513 [details] > GPU fragment time > > Why does the Webkit.GPU process occupy the fragment for so long? Is it > related to shader complexity or read/write bandwidth? We found that these > two performance data of safari's WebGL application are much higher than > those of APP. > Phones run this kind of WebGL to render webpages and get hot after a few > minutes. I'm seeing similar behavior in my test project. Any updates to this issue? |