How can I optimize pattern matching on views?

1 Aug 2007

      Background: I participated in this year's ICFP programming
contesthttp://www.icfpcontest.organd our team did quite well, coming
in 37th.  Our simulator (in somewhat
naive C++ with a good algorithm) took about 45 seconds to run the original
problem, and afterwards one of my coworkers took the same algorithm and
optimized it to run in about 6-10 seconds.

The rest of the email will have minor spoilers, so skip it if you want to
work on the problem yourself.

I'm using that problem as a good testcase for learning to write
high-performance Haskell.  I'm using the same data structure and basic
algorithm as the C++ version, but I'm having problems getting the
performance where I think it should be.

The relevant parts of the code:
...
import qualified Data.ByteString.Base as B
import qualified Data.ByteString.Char8 as BC
type DNABlock = B.ByteString
type DNA = [DNABlock]
I represent the DNA string as a simplified
ropehttp://en.wikipedia.org/wiki/Rope_(computer_science)that
supports fast reading from the front & prepending.

To access this in a reasonable fashion, I used a view to let me treat the
data as a string.  Here's a sample bit of code using that view:
...
matchConsts :: Int -> DNA -> (Int, DNA)
matchConsts len dna | len `seq` dna `seq` False = undefined -- force
strictness
matchConsts len dna = case dnaView dna of
   ('F':_) -> matchConsts (len+1) (dropDna 1 dna)
   ('C':_) -> matchConsts (len+1) (dropDna 1 dna)
   ('P':_) -> matchConsts (len+1) (dropDna 1 dna)
   ('I':'C':_) -> matchConsts (len+2) (dropDna 2 dna)
   _ -> (len, dna)
...
dropDna :: Int -> DNA -> DNA
dropDna n dna | n `seq` dna `seq` False = undefined -- force strictness
dropDna _ []   = []
dropDna 0 dna  = d
dropDna n (d:ds)
  | n >= BC.length d = dropDna (n - BC.length d) ds
  | otherwise        = B.unsafeDrop n d : ds
{-# INLINE dropDna #-}
Profiling showed that almost all of my time & allocations were spent in my
view function:
...
dnaView :: DNA -> String
-- dnaView d = foldr (++) [] (map BC.unpack d)
-- This was ridiculously slow for some reason that I don't entirely
understand
dnaView [] = []
dnaView (d:ds)
  | BC.null d = dnaView ds
  | otherwise = (w2c $ B.unsafeHead d) : dnaView (B.unsafeTail d : ds)
{-# INLINE dnaView #-}
The question I have for you, the haskell-cafe-reading-audience, is how can I
get GHC to do smart code-gen for this?  I want "case dnaView dna of ..." to
not allocate and instead fuse the list generation with the pattern-match.

  -- ryan

Ryan Ingram

tags

participants (1)