Hello,
I am sure folks are aware of this, but since it has not been mentioned
another example of useful fusion is `to . from` from `Generics`, where we'd
like the representation types to never exist at runtime. I think
conceptually the problem is similar, but it's a bit interesting in that it
has a different flavor to streams, and also there's a type family involved.
Cheers,
Iavor
On Tue, Dec 9, 2025 at 5:27 AM Sebastian Graf
(Perhaps I should have said "returning `Step`" instead of "involving `Step`".)
Am Di., 9. Dez. 2025 um 14:26 Uhr schrieb Sebastian Graf < sgraf1337@gmail.com>:
I agree that multi-stage programming is the principled green field solution. It provides the right model in which to think about solutions for GHC. For GHC, that model certainly means "Step is just an intermediate data structure", i.e., any computation involving `Step` should only live at compile-time ("template polymorphism") and it should be inlined/specialised for aggressively.
This is not too different for the situation with list fusion. Any compiled program that *runs* `map`/`filter`/`foldr`/`build` has a performance bug. You always want to end up with `foldr`'s SAT'd unfolding/explicit `(:)`/`[]` instead of a higher-order function `build` which does the same thing. That is achieved by the intricate multiphase rewrite rule framework. If all goes well, the final program never contains any lists whatsoever; only a nesting of tight recursive loops. Note that there is no code sharing of `map`/`filter`/`foldr` here; they must all have been inlined and rewritten and were purely a compile-time thing.
We want the same for stream fusion because it has better fusion support for `zipWith`. Inlining any function involving `Step` is absolutely the right thing to do! Aggressive specialization as well, but that's really difficult to do using SpecConstr because it blows up so easily. The stream fusion paper https://www.cs.tufts.edu/~nr/cs257/archive/duncan-coutts/stream-fusion.pdf instead suggests applying the static argument transformation in section 7.2. A draft patch of mine tried to do that on the fly to compute unfoldings for recursive functions with static args: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4553. It lacks polishing and my time and interest, but I firmly believe this should help the stream fusion use case, because SAT+inlining is *exactly* specialization of recursive functions for static arguments. (Side Note: Andrasz pointed out to me two years ago that there were further complications which I don't recall and which mean that he settled on the practical compromise of going for foldr/build fusion and enabling zip fusion/parallel for with stream fusion.)
---
what happens if you have sharing let ys :: Stream Int = h x in f ys + g ys
This is a flaw in the way that GHC says "lists are streams". IMO a stream (such as in streamly) should happily duplicate all the work that goes into computing the elements of `ys`, since it is just code. By contrast, data structures such as lists do not have that luxury. `let ys :: [Int] = ...; in f ys ++ g ys` should not try to fuse `ys`. Here's a repo that implements just foldr/build fusion without rewriting lists to/from it: https://github.com/sgraf812/foldr-build. It doesn't have the "fusion duplicates work" flaw because there is no data that should have shared the work in the first place.
---
*TLDR;* inlining and specialising `Step` sounds like exactly what is needed. Specialisation of concatMap is the hard part for stream fusion, though, and that means we'll end up with `Step` in the program nonetheless.
Btw., Lean's standard `Iterator` framework https://github.com/leanprover/lean4/blob/6e711bf067a0d9dc6623996b4bd3757ac6c... (inspired by Rust's https://doc.rust-lang.org/std/iter/trait.Iterator.html) will implement "stream fusion". (That's what external iteration is.) I'm not saying that it's simple or that the optimiser is equipped to handle it, though :)
Cheers, Sebastian
Am Di., 9. Dez. 2025 um 13:45 Uhr schrieb Simon Peyton Jones < simon.peytonjones@gmail.com>:
I think a solution based on multi-stage programming is the most
natural way to tackle the stream fusion problem in a robust manner.
How close are we to being able to do that in Haskell?
Simon
On Tue, 9 Dec 2025 at 09:56, Matthew Pickering < matthewtpickering@gmail.com> wrote:
I think a solution based on multi-stage programming is the most natural way to tackle the stream fusion problem in a robust manner.
For example: https://andraskovacs.github.io/pdfs/2ltt_icfp24.pdf
By allowing a programmer to express which parts of the program should only exist code generation time, you guarantee that fusion occurs. This seems much more aligned with how Haskell programmers think about other things.
Cheers,
Matt
On Tue, Dec 9, 2025 at 12:03 AM Harendra Kumar < harendra.kumar@gmail.com> wrote:
On Mon, 8 Dec 2025 at 23:35, Jaro Reinders
wrote:
I suggested we could integrate this into GHC by giving an inlining
discount to
any function which contains function that is also mentioned in a rewrite rule.
There are two distinct fusion mechanisms under consideration here - foldr/build fusion and stream fusion. Rewrite rules allow us to replace a combination of functions with an equivalent function, forming the basis of foldr/build fusion by fusing or eliminating intermediate functions. Stream fusion on the other hand eliminates intermediate constructors, the key optimizations for achieving this are case-of-case and worker-wrapper. Both types of fusion require inlining but they are clearly different at the implementation level, even though the same high level ideas may apply in both cases.
The fusion-plugin deals with stream fusion only -- where inlining and spec-constr are used to expose constructors which are then eliminated by case-of-case (for linear pipelines) or worker wrapper (for recursive loops) transformations. fusion-plugin has been used extensively in streamly and we have been able to fuse almost all cases that we wanted to. I would recommend that you try out fusion-plugin on the vector package, it can help fuse the pesky cases and you can also gain some insight into how much code bloat it actually causes in practice. Vector is primarily based on stream fusion, and even though it uses rewrite rules, those are not for fusion as such I believe.
However, I also noted I think this will have a large impact on code size and compilation time in larger code bases. Simon agreed, but noted that Harendra's plugin is much more selective than what I suggested.
Typically we have a pipeline of functions connected by fusible constructors. To fuse the entire pipeline completely we need to inline all the functions involved in the pipeline. Thus instead of having multiple small functions we end up creating one large combined function. However, when we use case-of-case transformation on the combined function, the size often gets reduced dramatically. In fact the total size of the end product is usually smaller than the combined original components in the pipeline. However, ghc requires more memory and CPU to simplify, especially if the loop becomes too large.
In our experience aggressive inlining for fusing linear segments of pipelines even if they are pretty big has almost never been a problem. If it becomes large we can always break the pipeline by using a NOINLINE, making a trade off between runtime perf and compile time resource requirement, but we rarely needed to do that. The code bloat comes more often from the spec-constr optimization when we are fusing recursive loops. In some cases we have used -fspec-constr-recursive=16, which I admit is a very large value, the compile times can sometimes increase dramatically due to this. We can always choose to use a lower value which may result in unfused code in some cases. There may be scope of improvement in GHC for making this optimization smarter and more selective.
We can also put some hard thresholds or heuristics here for the cases when the resulting monolithic functions become too large. One possible way to decide whether to inline or not could be to check if the size of the combined/optimized function is smaller than or roughly equivalent to the sum of the individual parts. If inlining does not result in elimination of the constructors in subsequent passes, then the inlining is not useful. But we have seen that it is almost always possible to eliminate all or most of the constructors.
Translating this to the fold/build fusion system in GHC today, we would force inlining of any function that calls `foldr` or `build` (or `augment`). Essentially, we would want to inline any binding that refers to a function that partakes in the rewrite rules for fusion. I'll admit we probably don't have to generalise this to all rewrite rules, but even with this restriction it seems like a rather aggressive optimisation strategy to me.
I do not have much experience with using foldr/build. But conceptually this is quite similar to what we do in stream fusion and similar ideas should be applicable here as well. In this case we inline to expose foldr/build primitives and then use rewrite rules to combine those. If the inlining does not result in further elimination or firing of rewrite rules then the inlining is not useful. You can probably try out automatic inlining in the same way as we do in fusion-plugin and see if code-bloat/compilation times become a real issue, and in which particular cases, then put heuristics/thresholds to avoid that. Is there a driving use case for extending this to foldr/build?
-harendra _______________________________________________ ghc-devs mailing list -- ghc-devs@haskell.org To unsubscribe send an email to ghc-devs-leave@haskell.org
ghc-devs mailing list -- ghc-devs@haskell.org To unsubscribe send an email to ghc-devs-leave@haskell.org
_______________________________________________ ghc-devs mailing list -- ghc-devs@haskell.org To unsubscribe send an email to ghc-devs-leave@haskell.org
_______________________________________________ ghc-devs mailing list -- ghc-devs@haskell.org To unsubscribe send an email to ghc-devs-leave@haskell.org