[GHC] #16190: Speed up handling of large String literals

#16190: Speed up handling of large String literals -------------------------------------+------------------------------------- Reporter: hsyl20 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.3 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- The idea would to try the following optimization for literal strings by adding a pass in the simplifier that does: 1. detect large string literals (> 2KB maybe? we would need to find a threshold) 2. append large strings into a temporary binary file, remember their offset and size 3. include the binary file into the compilation chain using ".incbin" in an assembler file (see the details here: https://hsyl20.fr/home/posts/2019-01-15-fast-file-embedding-with- ghc.html); add a fresh global module-specific symbol for the file, say "largestrings" 4. replace `Lit (Literal (LitString LARGE_BYTESTRING))` expressions with `unpackNBytes# (plusAddr (CLabel "largestrings") offset) size` -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16190 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16190: Speed up handling of large String literals -------------------------------------+------------------------------------- Reporter: hsyl20 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Do you really need a separate binary file? That seems a bit complicated... Can't you just have literal data in the assembly file? There are a bunch of other tickets about literal strings: * #16014 * #15210 * #15113 * #13390 Would any of these get fixed? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16190#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16190: Speed up handling of large String literals -------------------------------------+------------------------------------- Reporter: hsyl20 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hsyl20):
Do you really need a separate binary file? That seems a bit complicated... Can't you just have literal data in the assembly file?
Part of the optimization is to avoid ending up with binary data in textual form in a source file (e.g. assembly file). It's not complicated, GHC already has all the machinery to do it (addTempFile, addForeignFilePath, etc.). It should be worth trying in my opinion.
Would any of these get fixed?
As far as I know, no. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16190#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16190: Speed up handling of large String literals -------------------------------------+------------------------------------- Reporter: hsyl20 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hsyl20): I have experimented with this in the following branch: https://gitlab.haskell.org/hsyl20/ghc/tree/hsyl20-T16190 It adds a GHC command-line flag to choose the threshold (in bytes) above which the strings are dumped in a binary file (e.g. `./inplace/bin/ghc- stage2 -fbinary-string-threshold=200`). The current implementation performs the transformation in the tidy pass. I'm not sure if it is the best place to do it. The included `bench_literal.sh` script generates an Haskell source file which prints a string literal and then benches the time required to compile it. Below are the results for different string lengths. With this test, the optimization starts to be really noticeable for string literals
500K.
|| Size || No optimization || Threshold 100 || Gain || || 128 || 0.730 || 0.770 || -5% || || 3K || 0.731 || 0.829 || -13% || || 30K || 0.736 || 0.805 || -9% || || 100K || 0.804 || 0.856 || -6% || || 500K || 1.038 || 0.912 || +12% || || 1M || 1.342 || 1.101 || +18% || || 2M || 1.926 || 1.446 || +25% || || 3M || 2.539 || 1.805 || +29% || || 30M || 20.182 || 10.244 || +49% || I couldn't test with TH generated string literals as it crashes the compiler because some names become garbage with the optimization enabled. We should probably only enable the optimization if we generate an actual linked object file. It isn't done yet. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16190#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16190: Speed up handling of large String literals -------------------------------------+------------------------------------- Reporter: hsyl20 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): What does "no optimisation" and "threshold 100" mean? I though if "threshold 100" means that if a literal is > 100 bytes then it is put in the extra file, then every single line will do that, so the threshold is irrelevant. I wonder why it every slows down? Just the extra file handling? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16190#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

What does "no optimisation" and "threshold 100" mean? I though if "threshold 100" means that if a literal is > 100 bytes then it is put in
#16190: Speed up handling of large String literals -------------------------------------+------------------------------------- Reporter: hsyl20 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hsyl20): the extra file, then every single line will do that, so the threshold is irrelevant. Indeed with threshold=100 the optimization applies in every tested case to the string literal in the file. I could have written "optimization enable" but then we would have wondered what the threshold was (in particular if the optimization was triggered for small strings).
I wonder why it every slows down? Just the extra file handling?
I guess it's extra file handling (generation and linking) + the traversal of the module bindings. With the small strings, this test seems to be consistently worse with the optimization enabled (i.e. it's not just measure noise). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16190#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16190: Speed up handling of large String literals -------------------------------------+------------------------------------- Reporter: hsyl20 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hsyl20): New approach: instead of messing with Core, we only add the optimization into the NCG. For large strings, instead of printing `.asciz "..."` we print `.incbin "tmpfileXXX.dat"` and we dump the contents of the string into "tmpfileXXX.dat". This relies on the patch for #16198 so that we benefit from ByteStrings for string literals down to the NCG. I have updated the branch: https://gitlab.haskell.org/hsyl20/ghc/tree/hsyl20-T16190 This new approach works very well even with TH generated string literals. The following table shows the time to compile a source that uses `file- embed` with different file sizes. I have set the threshold to 500k for the test and as the default value in the patch (hence the optimization is not triggered for the 4 first file sizes). || Size || 8.6.3 || HEAD + #16198 || HEAD + #16198 + Threshold 500K || Gain || || 128 || 2.650 || 2.331 || 2.346 || (-0.5%) || || 3K || 2.651 || 2.289 || 2.290 || (0%) || || 30K || 2.590 || 2.353 || 2.307 || (+2%) || || 100K || 2.717 || 2.379 || 2.389 || (0%) || || 500K || 3.621 || 2.814 || 2.331 || +17% || || 1M || 4.694 || 3.526 || 2.654 || +24% || || 2M || 6.784 || 4.668 || 2.650 || +21% || || 3M || 8.851 || 5.616 || 3.073 || +45% || || 30M || 63.181 || 34.318 || 8.517 || +75% || -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16190#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16190: Speed up handling of large String literals -------------------------------------+------------------------------------- Reporter: hsyl20 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Sounds great! Before you are done, could you write a Note that gives a birds-eye view of all the moving parts involved in string literals, and point to it from appropriate places? Thanks for working so hard on this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16190#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16190: Speed up handling of large String literals -------------------------------------+------------------------------------- Reporter: hsyl20 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hvr): Has any thought been given if and how this will interact with https://github.com/ghc-proposals/ghc-proposals/pull/135? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16190#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16190: Speed up handling of large String literals -------------------------------------+------------------------------------- Reporter: hsyl20 | Owner: (none) Type: task | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hsyl20): @simonpj I will add a note for this one before submitting a MR. @hvr Yes. With the new approach there is no interaction at all as we don't perform any substitution in Core anymore. Whenever the NCG has to embed a big chunk of bytes, it can use the ".incbin" technique. If we add literals that use `ByteArray`, we can easily dump them into a file too, whatever their actual contents is: big `ByteArray`s are already pinned by GHC so we just have to `write` them by using their address and size. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16190#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16190: Speed up handling of large String literals -------------------------------------+------------------------------------- Reporter: hsyl20 | Owner: (none) Type: task | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | https://gitlab.haskell.org/ghc/ghc/merge_requests/143 -------------------------------------+------------------------------------- Changes (by hsyl20): * status: new => patch * differential: => https://gitlab.haskell.org/ghc/ghc/merge_requests/143 Comment: MR: https://gitlab.haskell.org/ghc/ghc/merge_requests/143 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16190#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16190: Speed up handling of large String literals -------------------------------------+------------------------------------- Reporter: hsyl20 | Owner: (none) Type: task | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 8.6.3 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | https://gitlab.haskell.org/ghc/ghc/merge_requests/143 -------------------------------------+------------------------------------- Changes (by hsyl20): * status: patch => closed * resolution: => fixed Comment: Merged in master with 1d9a1d9fb8fe0a1fea2c44c4246f102ff3e1f3a3 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/16190#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#16190: Speed up handling of large String literals
-------------------------------------+-------------------------------------
Reporter: hsyl20 | Owner: (none)
Type: task | Status: closed
Priority: normal | Milestone:
Component: Compiler | Version: 8.6.3
Resolution: fixed | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: | https://gitlab.haskell.org/ghc/ghc/merge_requests/143
-------------------------------------+-------------------------------------
Comment (by Marge Bot

#16190: Speed up handling of large String literals
-------------------------------------+-------------------------------------
Reporter: hsyl20 | Owner: (none)
Type: task | Status: closed
Priority: normal | Milestone:
Component: Compiler | Version: 8.6.3
Resolution: fixed | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: | https://gitlab.haskell.org/ghc/ghc/merge_requests/143
-------------------------------------+-------------------------------------
Comment (by Marge Bot
participants (1)
-
GHC