Re: [Haskell-cafe] Performance best practices

1 Aug 2019

      In high performance Haskell, you will often find yourself using sequential
structures besides lists. For example, an unboxed vector implementation is
over 100x faster than any of the proposed list implementations.

Code: https://gist.github.com/wyager/45946f9f1531351468e4366b7ba168fa

Benchmark result:
https://gist.github.com/wyager/96e7876a4b170d83dca971dd152e475e

GHC is very powerful, and can often do a surprisingly good job of
optimizing away list allocations and such. However, in sharing-heavy
applications like this (and if random indexing is helpful), Vectors can be
much more efficient.

On Thu, Aug 1, 2019 at 5:25 PM Jaro Reinders 
wrote:
...
I just realized seqTails and seqInits are unnecessary, seqList is enough.
On 01-08-2019 11:04, Jaro Reinders wrote:
...
Replying to myself, you can actually write an evaluation function that
forces all values in the result of tails and inits in linear time:
-- force the result of tails in linear time
    seqTails (x:xs) = x `deepseq` seqList xs
    seqTails [] = ()
-- force all the values in a list to whnf in linear time
    -- https://wiki.haskell.org/Weak_head_normal_form
    seqList (x:xs) = x `seq` seqList xs
    seqList [] = ()
-- force the result of inits in linear time
    seqInits xs = last xs `deepseq` seqList xs
Try it in ghci with :sprint
(
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/ghci.html#gh...
):
> let x = tails [1..3::Int]
    > :sprint x
    x = _
    > seqTails x
    ()
    > :sprint x
    x = [[1,2,3],[2,3],[3],[]]
> let y = inits [1..3::Int]
    > :sprint y
    y = _
    > seqInits y
    ()
    > :sprint y
    y = [[],[1],[1,2],[1,2,3]]
Using criterion you can see that it is actually linear time:
main = defaultMain
      [ bgroup "inits"
        [ bench "1000" $ whnf (seqInits . inits) [1..1000 :: Int]
        , bench "10000" $ whnf (seqInits . inits) [1..10000 :: Int]
        , bench "100000" $ whnf (seqInits . inits) [1..100000 :: Int]
        , bench "1000000" $ whnf (seqInits . inits) [1..1000000 :: Int]
        ]
      , bgroup "tails"
        [ bench "1000" $ whnf (seqTails . tails) [1..1000 :: Int]
        , bench "10000" $ whnf (seqTails . tails) [1..10000 :: Int]
        , bench "100000" $ whnf (seqTails . tails) [1..100000 :: Int]
        , bench "1000000" $ whnf (seqTails . tails) [1..1000000 :: Int]
        ]
      ]
benchmarking inits/1000
    time                 204.2 μs   (203.2 μs .. 205.4 μs)
                         1.000 R²   (1.000 R² .. 1.000 R²)
    mean                 203.4 μs   (202.8 μs .. 204.1 μs)
    std dev              2.163 μs   (1.755 μs .. 2.664 μs)
benchmarking inits/10000
    time                 3.127 ms   (3.107 ms .. 3.148 ms)
                         1.000 R²   (0.999 R² .. 1.000 R²)
    mean                 3.105 ms   (3.088 ms .. 3.118 ms)
    std dev              45.73 μs   (32.97 μs .. 69.14 μs)
benchmarking inits/100000
    time                 41.05 ms   (39.11 ms .. 42.87 ms)
                         0.993 R²   (0.988 R² .. 0.998 R²)
    mean                 41.52 ms   (40.62 ms .. 42.46 ms)
    std dev              1.912 ms   (1.330 ms .. 2.930 ms)
    variance introduced by outliers: 12% (moderately inflated)
benchmarking inits/1000000
    time                 423.0 ms   (318.2 ms .. 535.5 ms)
                         0.991 R²   (0.969 R² .. 1.000 R²)
    mean                 459.1 ms   (428.8 ms .. 505.2 ms)
    std dev              44.05 ms   (10.06 ms .. 58.49 ms)
    variance introduced by outliers: 22% (moderately inflated)
benchmarking tails/1000
    time                 8.811 μs   (8.768 μs .. 8.873 μs)
                         1.000 R²   (0.999 R² .. 1.000 R²)
    mean                 8.874 μs   (8.819 μs .. 8.963 μs)
    std dev              225.7 ns   (168.4 ns .. 325.2 ns)
    variance introduced by outliers: 28% (moderately inflated)
benchmarking tails/10000
    time                 87.21 μs   (86.85 μs .. 87.79 μs)
                         1.000 R²   (0.999 R² .. 1.000 R²)
    mean                 87.42 μs   (87.01 μs .. 87.88 μs)
    std dev              1.481 μs   (1.132 μs .. 1.953 μs)
    variance introduced by outliers: 11% (moderately inflated)
benchmarking tails/100000
    time                 886.9 μs   (882.9 μs .. 890.9 μs)
                         1.000 R²   (1.000 R² .. 1.000 R²)
    mean                 881.5 μs   (878.1 μs .. 885.7 μs)
    std dev              12.40 μs   (9.598 μs .. 18.97 μs)
benchmarking tails/1000000
    time                 9.796 ms   (9.757 ms .. 9.840 ms)
                         1.000 R²   (1.000 R² .. 1.000 R²)
    mean                 9.817 ms   (9.791 ms .. 9.845 ms)
    std dev              78.47 μs   (60.63 μs .. 99.31 μs)
On 01-08-2019 10:25, Jaro Reinders wrote:
...
If you fully evaluate the list produced by tails, then you're still
spending O(n^2) time, because that is just the size of the produced
list. But constructing the list and the memory taken by the list is
O(n), because most of the lists are shared
(https://wiki.haskell.org/Sharing).
On 01-08-2019 04:45, Todd Wilson wrote:
...
It seems that, asymptotically, tails is O(n) while inits is O(n^2) in
both time and space (when fully evaluated)

Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

Re: [Haskell-cafe] Performance best practices

William Yager