Personally, I disagree with the whole premise. Lists are simple and elegant; you *should* use them most of them time. I'm not saying you should use lists as the data structure for an in-memory database or whatever, but that's the point—most applications only have a handful of data structures that "matter", and lists are great everywhere else.

I actually went through and replaced [] with Vector in all of the types we parse from JSON at work, some of which get relatively large. It made the code uglier and didn't meaningfully affect the performance. I undid that change, even though it's exactly the sort of thing this article recommends. In this day and age, simple things *scale*. That's enough most of the time; if you can get away with it you *should*.

The real advantage of lists comes at an intersection of two points: lists are effective in place of iterators in Haskell and, even misused as data structures, they're *not that bad* most of the time. This means that a good 80% of the time, the advantage of using a type that's compatible with the rest of my code and APIs that use lists "correctly" as iterators easily outweighs any small performance penalty. A list has to get pretty large—or my usage pattern pretty convoluted—before another type is worth the complexity.

On Mon, Mar 13, 2017 at 3:43 AM, Johannes Waldmann <johannes.waldmann@htwk-leipzig.de> wrote:

Hi Olaf -

> I'd be interested whether there is a way to check
> which of my lists in the source code the compiler managed to "deforest" away.
> Which intermediate files should I look at? What are the tools to inspect?

Good question, and I don't know an easy answer.

The general advice is running ghc with "-ddump-simpl"
but I find it quite challenging to scan the output.

Here is a simple case where it works:

$ cat Fuse.hs

main = print $ sum $ map (^2) [1 .. 1000 :: Int]

$ ghc -fforce-recomp -ddump-simpl -O0 Fuse.hs

no optimisation - shows that "main" calls "map" etc.

$ ghc -fforce-recomp -ddump-simpl -O2 Fuse.hs

fusion works - nice non-allocating inner loop
(lists gone, and Int replaced by Int#)

Rec {
-- RHS size: {terms: 18, types: 3, coercions: 0}
Main.$wgo [InlPrag=[0], Occ=LoopBreaker]
:: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType <S,1*U><S,U>]
Main.$wgo =
\ (w_s5kV :: GHC.Prim.Int#) (ww_s5kZ :: GHC.Prim.Int#) ->
case w_s5kV of wild_Xn {
__DEFAULT ->
Main.$wgo
(GHC.Prim.+# wild_Xn 1#)
(GHC.Prim.+# ww_s5kZ (GHC.Prim.*# wild_Xn wild_Xn));
1000# -> GHC.Prim.+# ww_s5kZ 1000000#
}
end Rec }

but the challenge is to find the path from "main" to that,
wading through several other functions that may or may not be related.

I can imagine a source annotation like "in the code compiled
from this function f, that constructor C should never be called"
but this is certainly not easy. Do we really mean "never", or do we mean
"only a bounded number of times" (that is, not in the inner loop).
Perhaps there is no code for f itself, because it gets inlined.

But yes, *some* automated analysis (and human-readable print-out)
of the code after simplification would be nice.

This could be done as a compiler plug-in?
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/extending_ghc.html#compiler-plugins

- J.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.