
Am Donnerstag, 28. August 2008 01:46 schrieb Rafael Gustavo da Cunha Pereira Pinto:
First of all, sorry asking too many questions!
That's what this list is for :)
I am using the ICFP 2007 (http://save-endo.cs.uu.nl/) problem as a case for learning Haskell.
I started with the prototype, listed below
=================Main.hs======================= module Main(main) where
import qualified Data.ByteString.Lazy.Char8 as L import qualified Data.ByteString.Char8 as S
import Data.ByteString.Search.BoyerMoore as BM import Debug.Trace
data DNA=DNA {prefix :: L.ByteString, suffix :: L.ByteString}
main::IO () main=do d<-L.readFile "endo.dna" let dna=DNA (L.pack "IIIIIIIIIIICCICCICCICFFIIIIIIIIIIIIIIIICFP") d let pat=S.pack "ICFP" print $ take 1 $ matchSL pat $ L.append (prefix dna) (suffix dna) =================EOF Main.hs===================
My question is:
Knowing that the searched pattern is in DNA prefix, will "suffix dna" ever be called here? I mean, will the "append" suspension ever be fully evaluated?
No. That's one nice thing about lazy ByteStrings. You only use as many chunks
as you really need.
L.append is lazy enough to survive even L.append (prefix dna) undefined.
Unfortunately our BM searching algorithm isn't, an optimisation for the case
of a lazy ByteString consisting of a single chunk requires that the second
argument of L.append is reduced to WHNF (KMP doesn't), so it needs to read
one chunk from the file. Hence in this case (suffix dna) is called, but only
one chunk of it is used.
You can verify this by running your programme with the option "+RTS -sstderr",
which will print out some time and allocation stats. I did:
dafis@linux:~/Documents/haskell/move> ghc -O2 --make endo
[1 of 1] Compiling Main ( endo.hs, endo.o )
Linking endo ...
dafis@linux:~/Documents/haskell/move> ./endo +RTS -sstderr
./endo +RTS -sstderr
[38]
89,388 bytes allocated in the heap
520 bytes copied during GC (scavenged)
0 bytes copied during GC (not scavenged)
57,344 bytes maximum residency (1 sample(s))
I am thinking on using a data type like
data DNA=DNA {prefix::L.ByteString, worked::L.ByteString, suffix::ByteString}
The invariant is that worked should be L.empty whenever prefix is L.empty.
Thanks in advance
Cheers, Daniel