Bug 252981

Summary:

useConcurrentJIT=true makes JS2-wasm slower on higher core counts

Product:

WebKit

Reporter:

Angelos Oikonomopoulos <angelos>

Component:

JavaScriptCore

Assignee:

Nobody <webkit-unassigned>

Status:

NEW

Severity:

Normal

CC:

keith_miller, mark.lam, webkit-bug-importer, ysuzuki

Priority:

Keywords:

InRadar

Version:

WebKit Nightly Build

Hardware:

Unspecified

OS:

Unspecified

Attachments:

Description	Flags
Plot of richards-wasm score vs number of worklist threads, --useBBQJIT=false --useOMGJIT=false	none

Angelos Oikonomopoulos

Reported 2023-02-27 02:58:49 PST

Created attachment 465195 [details] Plot of richards-wasm score vs number of worklist threads, --useBBQJIT=false --useOMGJIT=false Running the wasm parts of JS2 (wasm-cli.js) with useConcurrentJIT=true on an 80-core ARM64 box results in a performance slowdown. Specifically, with --useBBQJIT=false --useOMGJIT=false, the total score for wasm-cli.js hovers around 10 with useConcurrentJIT=true and between 13-14 for useConcurrentJIT=false. As far as I can tell, this is because of contention when claiming functions from the Wasm::LLintPlan in the Wasm::Worklist. I can easily bring it back to parity with the useConcurrentJIT=false case by changing the worklist code to spawn a single thread or by only notifying one of the threads for the compilation phase. The test most affected seems to be richards-wasm. Experimenting just with richards-wasm, I see performance start dropping off when the number of threads goes over 12 or so. This may not currently be a big problem on end user devices, but it does affect benchmarking and is probably going to become more of an issue in the future. I've experimented with simply batching the workload: have each worklist thread claim consecutive functions (up to functionBytes/batchSize, functionBytes being the sum of the function sizes in the module, batchSize being a tunable) and I can finally get a modest (~10%) speedup for the useConcurrentJIT=true case with a batchSize of 16KB (didn't put a lot of effort into tuning). This probably applies on other wasm tiers too, but I haven't verified that. I can submit a PR with the batchSize changes but the batch size would have to be tuned for each user of Wasm::Worklist (and probably for each arch too).

Attachments
Plot of richards-wasm score vs number of worklist threads, --useBBQJIT=false --useOMGJIT=false (21.58 KB, image/png) 2023-02-27 02:58 PST, Angelos Oikonomopoulos	no flags	Details
View All Add attachment proposed patch, testcase, etc.

Angelos Oikonomopoulos

Comment 1 2023-02-27 03:02:48 PST

Another reason I'm reluctant to submit the batchSize patch is that, fundamentally, while batching would help (lots of those wasm functions are only a few bytes long) I don't think we want to do batch scheduling of the compilation jobs. As core counts get higher, I guess some kind of work-stealing worklist would make sense?

Radar WebKit Bug Importer

Comment 2 2023-03-06 02:59:14 PST

<rdar://problem/106280756>

Mark Lam

Comment 3 2023-03-06 08:22:54 PST

> Running the wasm parts of JS2 (wasm-cli.js) with useConcurrentJIT=true on an 80-core ARM64 box results in a performance slowdown. Specifically, with --useBBQJIT=false --useOMGJIT=false, the total score for wasm-cli.js hovers around 10 with useConcurrentJIT=true and between 13-14 for useConcurrentJIT=false. 1. What does it mean to run on an 80-core ARM64 box? 1.1 How many automatic threads are we actually spinning up in response to this? 1.2 How many of these threads are actually active during the slowdown? 1.3 What are these threads working on at that time? 2. What is nature of the workload? 2.1 Is the slowness due to more contention now? 2.2 Is the slowness due to memory bandwidth saturation? Instead of jumping to some solution prematurely, I think you should profile the workload to see what's changed in the high core count profile vs the low core count profile. Since not everyone else has access to this 80-core ARM64 box, you'll have to do this profiling to determine what's change in the performance profile.

Yusuke Suzuki

Comment 4 2023-03-06 23:52:22 PST

This is actually interesting, let me check a bit.

Yusuke Suzuki

Comment 5 2023-03-06 23:53:24 PST

Ah, so this only affects on large sized CPU cores (more than 15~)

Note You need to log in before you can comment on or make changes to this bug.