
On Tue, 12 Feb 2013 15:57:37 -0700
Nicolas Bock
Here is haskell version that is faster than python, almost as fast as c++. You need to install bytestring-lexing package for readDouble.
I was hoping Branimir could comment on how the improvements were allocated. how much is due to text.regex.pcre (which looks to be a wrapper to libpcre) ? how much can be attributed to using data.bytestring ? you have to admit, it's amazing how well a byte-compiled, _dynamically typed_ interpreter can do against an actualy native code compiler. Can't regex be done effectively in haskell ? Is it something that can't be done, or is it just such minimal effort to link to pcre that it's not worth the trouble ? Brian
import Text.Regex.PCRE import Data.Maybe import Data.Array.IO import Data.Array.Unboxed import qualified Data.ByteString.Char8 as B import Data.ByteString.Lex.Double (readDouble)
strataBounds :: UArray Int Double strataBounds = listArray (0,10) [ 0.0, 1.0e-8, 1.0e-7, 1.0e-6, 1.0e-5, 1.0e-4, 1.0e-3, 1.0e-2, 1.0e-1, 1.0, 2.0 ]
newStrataCounts :: IO(IOUArray Int Int) newStrataCounts = newArray (bounds strataBounds) 0
main = do l <- B.getContents let a = B.lines l strataCounts <- newStrataCounts n <- calculate strataCounts a 0 let printStrataCounts :: IO () printStrataCounts = do let s = round $ sqrt (fromIntegral n::Double) :: Int printf "read %d matrix elements (%dx%d = %d)\n" n s s n printStrataCounts' 0 0 printStrataCounts' :: Int -> Int -> IO () printStrataCounts' i total | i < (snd $ bounds strataBounds) = do count <- readArray strataCounts i let p :: Double p = (100.0*(fromIntegral count) :: Double)/(fromIntegral n :: Double) printf "[%1.2e, %1.2e) = %i (%1.2f%%) %i\n" (strataBounds ! i) (strataBounds ! (i+1)) count p (total + count) printStrataCounts' (i+1) (total+count) | otherwise = return () printStrataCounts
calculate :: IOUArray Int Int -> [B.ByteString] -> Int -> IO Int calculate _ [] n = return n calculate counts (l:ls) n = do let a = case getAllTextSubmatches $ l =~ B.pack "matrix.*= ([0-9eE.+-]+)$" :: [B.ByteString] of [_,v] -> Just (readDouble v) :: Maybe (Maybe (Double,B.ByteString)) _ -> Nothing b = (fst.fromJust.fromJust) a loop :: Int -> IO() loop i | i < (snd $ bounds strataBounds) = if (b >= (strataBounds ! i)) && (b < (strataBounds ! (i+1))) then do c <- readArray counts i writeArray counts i (c+1) else loop (i+1) | otherwise = return () if isNothing a then calculate counts ls n else do loop 0 calculate counts ls (n+1)
------------------------------ From: nicolasbock@gmail.com Date: Fri, 8 Feb 2013 12:26:09 -0700 To: haskell-cafe@haskell.org Subject: [Haskell-cafe] performance question
Hi list,
I wrote a script that reads matrix elements from standard input, parses the input using a regular expression, and then bins the matrix elements by magnitude. I wrote the same script in python (just to be sure :) ) and find that the python version vastly outperforms the Haskell script.
To be concrete:
$ time ./createMatrixDump.py -N 128 | ./printMatrixDecay real 0m2.655s user 0m2.677s sys 0m0.095s
$ time ./createMatrixDump.py -N 128 | ./printMatrixDecay.py - real 0m0.445s user 0m0.615s sys 0m0.032s
The Haskell script was compiled with "ghc --make printMatrixDecay.hs".
Could you have a look at the script and give me some pointers as to where I could improve it, both in terms of performance and also generally, as I am very new to Haskell.
Thanks already,
nick
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe