
Am Donnerstag 17 September 2009 21:07:28 schrieb Cristiano Paris:
On Tue, Sep 15, 2009 at 11:31 PM, Daniel Fischer
wrote: ... Yeah, you do *not* want the whole file to be read here, except above for testing purposes.
That's not true. Sometimes I want to, sometimes don't.
The "for the case of sorting by metadata" was tacitly assumed :)
But I want to use the same code for reading files and exploit laziness to avoid reading the body.
Still, ByteStrings are probably the better choice (if you want the body and that can be large).
That's not a problem by now.
To avoid reading the body without unsafePerformIO:
readBit fn = Control.Exception.bracket (openFile fn ReadMode) hClose (\h -> do l <- hGetLine h let i = read l bdy <- hGetContents h return $ Bit i bdy)
Same problem with the "withFile"-version: nothing gets printed if I try to print out the body: that's way I used seq.
Ah, yes. The file is closed too soon.
I'm starting to think that the only way to do this without using unsafePerformIO is to have the body being an IO action: simply, under Haskell assumption, that's not possible to write, because Haskell enforce safety above all.
Well, what about readBit fn = do txt <- readFile fn let (l,_:bdy) = span (/= '\n') txt return $ Bit (read l) bdy ? With main = do args <- getArgs let n = case args of (a:_) -> read a _ -> 1000 bl <- mapM readBit ["file1.txt","file2.txt"] mapM_ (putStrLn . show . index) $ sortBy (comparing index) bl mapM_ (putStrLn . take 20 . drop n . body) bl ./cparis3 30 +RTS -sstderr 2 3 CCGGGCGCGGTGGCTCACGC CCGGGCGCGGTGGCTCACGC 408,320 bytes allocated in the heap 1,220 bytes copied during GC 34,440 bytes maximum residency (1 sample(s)) 31,096 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) ./cparis3 20000 +RTS -sstderr 2 3 AAAATTAGCCGGGCGTGGTG AAAATTAGCCGGGCGTGGTG 1,069,168 bytes allocated in the heap 105,700 bytes copied during GC 137,356 bytes maximum residency (1 sample(s)) 27,344 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) ./cparis3 2000000 +RTS -sstderr 2 3 CCTGGCCAACATGGTGAAAC CCTGGCCAACATGGTGAAAC 80,939,296 bytes allocated in the heap 8,925,240 bytes copied during GC 137,056 bytes maximum residency (2 sample(s)) 45,528 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) %GC time 38.5% (27.0% elapsed) Alloc rate 1,264,577,704 bytes per MUT second Productivity 61.5% of total user, 38.8% of total elapsed ./cparis3 20000000 +RTS -sstderr 2 3 CAGAGCGAGACTCCGTCTCA CAGAGCGAGACTCCGTCTCA 806,034,756 bytes allocated in the heap 76,775,944 bytes copied during GC 136,876 bytes maximum residency (2 sample(s)) 43,324 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 1536 collections, 0 parallel, 0.35s, 0.35s elapsed Generation 1: 2 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 0.53s ( 0.67s elapsed) GC time 0.35s ( 0.36s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.88s ( 1.02s elapsed) %GC time 40.0% (34.9% elapsed) Alloc rate 1,526,482,681 bytes per MUT second Productivity 60.0% of total user, 51.7% of total elapsed Seems to work as desired.
Cristiano