Re: [Haskell-cafe] strict version of Haskell - does it exist?

30 Jan 2012

      Replying to all replies at once:
...
Malcolm Wallace
 At work, we have a strict version of Haskell
:-) which proofs that it is worth thinking about it.
...
Ertugrul
If you want to save the time to learn how to write efficient Haskell
programs, you may want to have a look into the Disciple language. 
Yes - I will. Its on my TODO list for at least 12 month :(
Not sure whether there are parser combinator libraries yet.
@  Herbert Valerio Riedel  (suggesting aeson)
I gave it yet another try - see below.
It still fails.

@ Felipe Almeida Lessa  (suggesting conduits and atto parsec)
I mentioned that I already tried it. Counting lines only was a lot slower than
counting lines and parsing JSON using PHP.

@ Chris Wong
...
flag is that strictness isn't just something you turn on and off willy-nilly
You're right. But those features are not required for all tasks :)
Eg have a look at Data.Map. Would a strict implementation look that different?
I came up with this now. Trying strict bytestrings and Aeson.
note the LB.fromChunks usage below to turn a strict into a lazy bytestring.

Result: PHP script doing the same runs in 33secs (using the
AFindCheckoutsAndContacts branch) The haskell program below - I stopped it
after 8 min. (At least it does no longer cause my system to swap ..

You're right: I could start profiling. I could learn about how to optimize it.)
But why? The conclusion is that if I ever use yesod - and if I ever want to
analyse logs - I should call PHP from yesod as external process !? :-(

Even if I had a core i7 (8 threads in parallel) I still would not be sure
whether Haskell would be the right choice. I agree that I assume that all data
fits into memory so that piecewise evaluation doesn't matter too much.

Thanks for all your help - it proofs once again that the Haskell
community is the most helpful I know about. Yet I think me being the
programmer Haskell is the wrong choice for this task.

Thanks Marc Weber

my new attempt - now I should start profiling.. Looks like I haven't
built all libs with profiling support ..

  import Data.Aeson.Types
  import Data.Aeson
  import Data.List
  import Control.Applicative
  import Debug.Trace
  import qualified Data.Map as M
  import Action
  import Data.Aeson.Parser as AP

  import qualified Data.ByteString.Lazy as LB

  import qualified Data.ByteString as BS
  import qualified Data.ByteString.Char8 as LBC8

  data Action = ACountLine | AFindCheckoutsAndContacts

  -- lines look like this:
  -- {"id":"4ee535f01550c","r":"","ua":"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)","t":1323644400,"k":"XXX","v":"YY"}

  data Item = Item {
    id :: SB.ByteString,
    ua :: SB.ByteString,
    t  :: Int,
    k  :: SB.ByteString,
    v  :: SB.ByteString
  }

  instance FromJSON Item where
      parseJSON (Object v) = Item <$>
                             v .: "id" <*>
                             v .: "ua" <*>
                             v .: "t" <*>
                             v .: "k" <*>
                             v .: "v"
      parseJSON _ = empty

  run :: Action -> [FilePath] -> IO ()

  run AFindCheckoutsAndContacts files = do
    -- find all ids quering the server more than 100 times.
    -- do so by building a map counting queries

    (lines :: [BS.ByteString]) <- fmap (concat . (map LBC8.lines) ) $ mapM BS.readFile files
    let processLine :: (M.Map BS.ByteString Int) -> BS.ByteString -> (M.Map BS.ByteString Int)
        processLine st line = case decode' (LB.fromChunks [line]) of
                                  Nothing -> st -- traceShow ("bad line " ++ (LBC8.unpack line)) st
                                  Just (Item id ua t k v) -> M.insertWith (+) k 1 st
    let count_by_id = foldl' processLine (M.empty) lines
    mapM_ print $ M.toList $ M.filter (> 100) count_by_id