No Derived Read for Unboxed Arrays

newer
GHC concurrency runtime "breaks"...

older
Re: [Haskell-cafe] A function for...

Matthew Bromberg

25 Jan 2007 25 Jan '07

9:05 p.m.

I was trying to convert some code from ordinary boxed array to unboxed arrays for performance reasons. Presumably the conversion, according to the GHC documentation should only involve appropriate type signatures (e.g. using UArray instead of Array) and importing Data.Array.Unboxed. However my code ultimately failed because I load a large array saved as a text file using the derived Read, Show mechanism. For some reason there is no Read function for unboxed arrays. Here is a little GHCI session showing the problem: Prelude> :m Data.Array.Unboxed Prelude Data.Array.Unboxed> let e = listArray (0,3) [0 .. 3] :: UArray Int Int Prelude Data.Array.Unboxed> let se = show e Prelude Data.Array.Unboxed> se "array (0,3) [(0,0),(1,1),(2,2),(3,3)]" Prelude Data.Array.Unboxed> let e2 = read se :: UArray Int Int <interactive>:1:9: No instance for (Read (UArray Int Int)) arising from use of `read' at <interactive>:1:9-15 Possible fix: add an instance declaration for (Read (UArray Int Int)) In the expression: read se In the expression: read se :: UArray Int Int In the definition of `e2': e2 = read se :: UArray Int Int However if I try to read it in as boxed arrays, no problem: Prelude Data.Array.Unboxed> let e2 = read se :: Array Int Int Prelude Data.Array.Unboxed> e2 array (0,3) [(0,0),(1,1),(2,2),(3,3)] Is there a reason why this doesn't work? Is it a design choice or does it merely reflect the fact that no one has gotten around to writing a read function for unboxed arrays. I imagine it would be trivial to fix this however.

Show replies by date

Neil Mitchell

26 Jan 26 Jan

12:10 a.m.

Hi,

...

I was trying to convert some code from ordinary boxed array to unboxed arrays for performance reasons.

...

However my code ultimately failed because I load a large array saved as a text file using the derived Read, Show mechanism.

I found that Read was maybe 30 times slower than the slowest binary serialisation method I could possibly think of. If performance matters to you, and the array is more than a few elements long, switching away from Read/Show should be the first step - before going for unboxed arrays. (But of course, having Read/Show defined for UArray may well be useful, and sounds a good idea) Thanks Neil

dons＠cse.unsw.edu.au

12:12 a.m.

ndmitchell:

...

Hi,

...
I was trying to convert some code from ordinary boxed array to unboxed arrays for performance reasons.

...
However my code ultimately failed because I load a large array saved as a text file using the derived Read, Show mechanism.

I found that Read was maybe 30 times slower than the slowest binary serialisation method I could possibly think of. If performance matters to you, and the array is more than a few elements long, switching away from Read/Show should be the first step - before going for unboxed arrays.

(But of course, having Read/Show defined for UArray may well be useful, and sounds a good idea)

There's also an instance Binary for UArray. That might be useful? -- Don

Neil Mitchell

12:15 a.m.

HI Don

...

...
(But of course, having Read/Show defined for UArray may well be useful, and sounds a good idea)

There's also an instance Binary for UArray. That might be useful?

Is there an instance Binary in a released library? Thanks Neil

SevenThunders

12:55 a.m.

Neil Mitchell wrote:

...

I found that Read was maybe 30 times slower than the slowest binary serialisation method I could possibly think of. If performance matters to you, and the array is more than a few elements long, switching away from Read/Show should be the first step - before going for unboxed arrays.

(But of course, having Read/Show defined for UArray may well be useful, and sounds a good idea)

Thanks

Neil _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

No doubt any kind of binary serialization would be a lot faster. In my case, however, I just wanted it to work out of the box. I need to read in about 5-10 arrays of only 1000 entries or so, saved in files. I suspect even the ascii parser could do that within a few seconds. I thought maybe a simple hack of the existing Read instantiation for Array would work, but I must be either too lazy or stupid to get it right the first time. My problem is I'm not really familiar with how these parsing routines work. instance (Ix a, Read a, Read b) => Read (UArray a b) where readsPrec p = readParen (p > 9) (\r -> [(array b as, u) | ("array",s) <- lex r, (b,t) <- reads s, (as,u) <- reads t ]) This fails to compile with the error: Could not deduce (IArray UArray b) from the context (Read (UArray a b), Ix a, Read a, Read b) arising from use of `array' at TunePerf.hs:23:20-29 Possible fix: add (IArray UArray b) to the class or instance method `readsPrec' or add an instance declaration for (IArray UArray b) In the expression: array b as In the expression: (array b as, u) In the expression: [(array b as, u) | ("array", s) <- lex r, (b, t) <- reads s, (as, u) <- reads t] Is there a simple hack to get this going? I might just end up reading it into a boxed array and then do the conversions. -- View this message in context: http://www.nabble.com/No-Derived-Read-for-Unboxed-Arrays-tf3119003.html#a864... Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

dons＠cse.unsw.edu.au

1:24 a.m.

New subject: Pure serialisation and compression [Was: No Derived Read for Unboxed Arrays]

mattcbro:

...

No doubt any kind of binary serialization would be a lot faster. In my case, however, I just wanted it to work out of the box. I need to read in about 5-10 arrays of only 1000 entries or so, saved in files. I suspect even the ascii parser could do that within a few seconds.

Faster, and trivial to write! Here's a complete example: This little demo shows how to serialise (read/show) unboxed arrays efficiently now using Data.Binary, available from Hackage, http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2 It uses lazy bytestrings to wrap fast serialisation, in the style of newBinary, in a pure interface. For example, to serialise a bunch of unboxed arrays to and from disk: Import Binary,

...

import Data.Binary import Data.Array.Unboxed import Codec.Compression.Zlib import qualified Data.ByteString.Lazy as B

...

type A = UArray Int Int

Create some unboxed arrays

...

a1, a2, a3, a4, a5 :: A a1 = listArray (0,999) [0..] a2 = listArray (0,999) [1..] a3 = listArray (0,999) [2..] a4 = listArray (0,999) [3..] a5 = listArray (0,999) [4..]

Collect our arrays, and the files we'll write them to

...

arrs :: [(FilePath,A)] arrs = [("a1",a1) ,("a1",a1) ,("a2",a2) ,("a3",a3) ,("a4",a4) ,("a5",a5)]

An action to write some unboxed arrays to disk

...

writeArrays :: [(FilePath,A)] -> IO () writeArrays as = mapM_ (uncurry encodeFile) as

An action to read them back from disk:

...

readArrays :: [FilePath] -> IO [(FilePath,A)] readArrays fs = zip fs `fmap` mapM decodeFile fs

Write, then read, and check that the operation was the identity

...

main' = do writeArrays arrs arrs' <- readArrays (map fst arrs) print . all (uncurry (==)) $ zip arrs arrs'

Running this: $ ghc -O A.hs --make [1 of 1] Compiling Main ( A.hs, A.o ) Linking A ... $ time ./A True ./A 0.01s user 0.02s system 107% cpu 0.022 total Using zlib (also available from hackage), we can do some more fun things. For example, compressing the arrays before we write them to disk:

...

writeGzipArrays :: [(FilePath,A)] -> IO () writeGzipArrays as = mapM_ (uncurry gzEncodeFile) as where gzEncodeFile f a = B.writeFile f . compress . encode $ a

And read them back in

...

readGzipArrays :: [FilePath] -> IO [(FilePath,A)] readGzipArrays fs = zip fs `fmap` mapM gzDecodeFile fs where gzDecodeFile f = return . decode . decompress =<< B.readFile f

Our main function then simply becomes:

...

main = do writeGzipArrays arrs arrs' <- readGzipArrays (map fst arrs) print . all (uncurry (==)) $ zip arrs arrs'

Runs a bit slower: $ time ./A True ./A 0.03s user 0.01s system 104% cpu 0.037 total But the resulting files on disk are, instead of word sized chunks: $ wc -c a1 8024 a1 Are 8 times smaller: $ wc -c a1 1869 a1 Yay for pure compression and serialisation! Note that you can use this code *right now* , in Hugs or GHc. Cheers, Don

SevenThunders

3:08 a.m.

New subject: Pure serialisation and compression [Was: No Derived Read for Unboxed Arrays]

Donald Bruce Stewart wrote:

...

mattcbro:

Faster, and trivial to write! Here's a complete example:

... _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Thanks for the example. However for this application, I like my arrays in an ascii readable format. In fact my arrays are themselves packed into a data record and the show and read mechanism is handy for capturing a snapshot of this data record, enabling me to inspect the values by opening the text files. Moreover my data record files are quite portable across many platforms. For processing large amounts of data, however, the binary approach would be the way to go. -- View this message in context: http://www.nabble.com/No-Derived-Read-for-Unboxed-Arrays-tf3119003.html#a864... Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

6732

Age (days ago)

6733

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

dons＠cse.unsw.edu.au
Matthew Bromberg
Neil Mitchell
SevenThunders