Bug 253358

Summary: Optimize HTML parser entity names table by omitting semicolons
Product: WebKit Reporter: Ahmad Saleem <ahmad.saleem792>
Component: DOMAssignee: Darin Adler <darin>
Status: RESOLVED FIXED    
Severity: Normal CC: cdumez, darin, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: Safari Technology Preview   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 250640    
Bug Blocks:    

Description Ahmad Saleem 2023-03-03 16:42:44 PST
Hi Team,

While merging blink commit in bug 250640 as PR below:

https://github.com/WebKit/WebKit/pull/10888

Darin's suggested that we should look for future optimization as Blink's comment mentioned to remove 'semi-colon' storage in array:

 # Reuse substrings from earlier entries. This saves 1-2000
    # characters, but it's O(n^2) and not very smart. The optimal
    # solution has to solve the "Shortest Common Superstring" problem
    # and that is NP-Complete or worse.
    #
    # This would be even more efficient if we didn't store the
    # semi-colon in the array but as a bit in the entry.

__________

I am just creating this to explore it in future to fix this bug as well.

If someone can guide, I am happy to look into it and if someone else can grab and do it quickly. I am happy as well.

_________

Darin's comment for safe-keeping:

'''This is a very good point. It would be useful to continue with this optimization and remove the semicolons from the array. I am almost certain this could be done with no performance impact.'''

___________

Just wanted to raise this bug.

Thanks!
Comment 1 Radar WebKit Bug Importer 2023-03-05 15:44:19 PST
<rdar://problem/106264459>
Comment 2 Darin Adler 2023-03-05 16:02:51 PST
Pull request: https://github.com/WebKit/WebKit/pull/11089
Comment 3 EWS 2023-03-15 22:41:28 PDT
Committed 261734@main (3483dcf98d88): <https://commits.webkit.org/261734@main>

Reviewed commits have been landed. Closing PR #11089 and removing active labels.