I agree that multi-stage programming is the principled green field solution. It provides the right model in which to think about solutions for GHC.
For GHC, that model certainly means "Step is just an intermediate data structure", i.e., any computation involving `Step` should only live at compile-time ("template polymorphism") and it should be inlined/specialised for aggressively.
This is not too different for the situation with list fusion. Any compiled program that *runs* `map`/`filter`/`foldr`/`build` has a performance bug. You always want to end up with `foldr`'s SAT'd unfolding/explicit `(:)`/`[]` instead of a higher-order function `build` which does the same thing. That is achieved by the intricate multiphase rewrite rule framework.
If all goes well, the final program never contains any lists whatsoever; only a nesting of tight recursive loops. Note that there is no code sharing of `map`/`filter`/`foldr` here; they must all have been inlined and rewritten and were purely a compile-time thing.
We want the same for stream fusion because it has better fusion support for `zipWith`. Inlining any function involving `Step` is absolutely the right thing to do! Aggressive specialization as well, but that's really difficult to do using SpecConstr because it blows up so easily. The
stream fusion paper instead suggests applying the static argument transformation in section 7.2. A draft patch of mine tried to do that on the fly to compute unfoldings for recursive functions with static args:
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4553. It lacks polishing and my time and interest, but I firmly believe this should help the stream fusion use case, because SAT+inlining is *exactly* specialization of recursive functions for static arguments.
(Side Note: Andrasz pointed out to me two years ago that there were further complications which I don't recall and which mean that he settled on the practical compromise of going for foldr/build fusion and enabling zip fusion/parallel for with stream fusion.)
---
> what happens if you have sharing
> let ys :: Stream Int = h x
> in f ys + g ys
This
is a flaw in the way that GHC says "lists are streams". IMO a stream
(such as in streamly) should happily duplicate all the work that goes
into computing the elements of `ys`, since it is just code. By contrast,
data structures such as lists do not have that luxury. `let ys :: [Int]
= ...; in f ys ++ g ys` should not try to fuse `ys`.
Here's a repo that implements just foldr/build fusion without rewriting lists to/from it:
https://github.com/sgraf812/foldr-build. It doesn't have the "fusion duplicates work" flaw because there is no data that should have shared the work in the first place.
---
TLDR; inlining and specialising `Step` sounds like exactly what is needed. Specialisation of concatMap is the hard part for stream fusion, though, and that means we'll end up with `Step` in the program nonetheless.
Btw., Lean's standard
`Iterator` framework (inspired by
Rust's) will implement "stream fusion". (That's what external iteration is.) I'm not saying that it's simple or that the optimiser is equipped to handle it, though :)
Cheers,
Sebastian