Re: [Haskell-cafe] strict version of Haskell - does it exist? - Haskell-Cafe - Haskell.org

newer
cabal-install package precedence...

Re: [Haskell-cafe] strict version of Haskell - does it exist?

older
ANN: test-framework-doctest 0.2...

Marc Weber

31 Jan 2012 31 Jan '12

3:36 p.m.

jsonLines :: C.Resource m => C.Conduit B.ByteString m Value jsonLines = C.sequenceSink () $ do val <- CA.sinkParser json' CB.dropWhile isSpace_w8 return $ C.Emit () [val]

Adding a \state -> (the way Felipe Lessa told me) make is work and it runs in about 20sec and that although some conduit overhead is likely to take place. omitting my custom data type using bytestrings operating on Value of Aeson reduces running time to 16secs. PHP/C++ still wins: less than 12secs. Now I can imagine again that even a desktop multi core system is faster than a single threaded C application. Thanks for your help. Maybe I can setup profiling again to understand why its still taking little bit more time. Marc Weber

Reply

Sign in to reply online Use email software

Show replies by date

Felipe Almeida Lessa

31 Jan 31 Jan

3:49 p.m.

New subject: strict version of Haskell - does it exist?

On Tue, Jan 31, 2012 at 1:36 PM, Marc Weber wrote:

Adding a \state -> (the way Felipe Lessa told me) make is work and it runs in about 20sec and that although some conduit overhead is likely to take place.

Just out of curiosity: did you use conduit 0.1 or 0.2? Cheers! =) -- Felipe.

Reply

Sign in to reply online Use email software

Marc Weber

3:57 p.m.

New subject: strict version of Haskell - does it exist?

Excerpts from Felipe Almeida Lessa's message of Tue Jan 31 16:49:52 +0100 2012:

Just out of curiosity: did you use conduit 0.1 or 0.2? I updated to 0.2 today because I was looking for a monad instance for SequenceSink - but didn't find it cause I tried using it the wrong way (\state -> see last mail)

I also tried json' vs json (strict and non strict versions) - didn't seem to make a big difference. Marc Weber

Reply

Sign in to reply online Use email software

Steve Severance

8:19 p.m.

New subject: strict version of Haskell - does it exist?

Hi Everyone, I had a similar experience with a similar type of problem. The application was analyzing web pages that our web crawler had collected, well not the pages themselves but metadata about when the page was collected. The basic query was: SELECT Domain, Date, COUNT(*) FROM Pages GROUP BY Domain, Date The webpage data was split out across tens of thousands of files compressed binary. I used enumerator to load these files and select the appropriate columns. This step was performed in parallel using parMap and worked fine once i figured out how to add the appropriate !s. The second step was the group by. I built some tools across monad-par that had the normal higher level operators like map, groupBy, filter, etc... The typical pattern I followed was the map-reduce style pattern used in monad-par. I was hoping to someday share this work, although I have since abandoned work on it. It took me a couple of weeks to get the strictness mostly right. I say mostly because it still randomly blows up, meaning if I feed in a single 40kb file maybe 1 time in 10 it consumes all the memory on the machine in a few seconds. There is obviously a laziness bug in there somewhere although after working on it for a few days and failing to come up with a solid repro case I eventually built all the web page analysis tools in scala, in large part because I did not see a way forward and need to tie off that work and move on. My observations: Combining laziness and parallelism made it very difficult to reason about what was going on. Test cases became non-deterministic not in terms out output in the success case but whether they ran at all. The tooling around laziness does not give enough information about debugging complex problems. Because of this when people ask "Is Haskell good for parallel development?" I tell them the answer is complicated. Haskell has excellent primitives for parallel development like the STM which I love but it lacks a PLINQ like toolkit that is fully built out to enable flexible parallel data processing. The other thing is that deepseq is very important . IMHO this needs to be a first class language feature with all major libraries shipping with deepseq instances. There seems to have been some movement on this front but you can't do serious parallel development without it. Some ideas for things that might help would be a plugin for vim that showed the level of strictness of operations and data. I am going to take another crack at a PLINQ like library with GHC 7.4.1 in the next couple of months using the debug symbols that Peter has been working on. Conclusion: Haskell was the wrong platform to be doing webpage analysis anyhow, not because anything is wrong with the language but simply it does not have the tooling that the JVM does. I moved all my work into Hadoop to take advantage of multi-machine parallelism and higher level tools like Hive. There might be a future in building haskell code that could be translated into a Hive query. With better tools I think that Haskell can become the goto language for developing highly parallel software. We just need the tools to help developers better understand the laziness of their software. There also seems to be a documentation gap on developing data analysis or data transformation pipelines in haskell. Sorry for the length. I hope my experience is useful to someone. Steve On Tue, Jan 31, 2012 at 7:57 AM, Marc Weber wrote:

Excerpts from Felipe Almeida Lessa's message of Tue Jan 31 16:49:52 +0100 2012:

...
Just out of curiosity: did you use conduit 0.1 or 0.2? I updated to 0.2 today because I was looking for a monad instance for SequenceSink - but didn't find it cause I tried using it the wrong way (\state -> see last mail)

I also tried json' vs json (strict and non strict versions) - didn't seem to make a big difference.

Marc Weber

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply

Sign in to reply online Use email software

Gregory Collins

9:22 p.m.

New subject: strict version of Haskell - does it exist?

On Tue, Jan 31, 2012 at 9:19 PM, Steve Severance wrote:

The other thing is that deepseq is very important . IMHO this needs to be a first class language feature with all major libraries shipping with deepseq instances. There seems to have been some movement on this front but you can't do serious parallel development without it.

I completely agree on the first part, but deepseq is not a panacea either. It's a big hammer and overuse can sometimes cause wasteful O(n) no-op traversals of already-forced data structures. I also definitely wouldn't go so far as to say that you can't do serious parallel development without it! The only real solution to problems like these is a thorough understanding of Haskell's evaluation order, and how and why call-by-need is different than call-by-value. This is both a pedagogical problem and genuinely hard -- even Haskell experts like the guys at GHC HQ sometimes spend a lot of time chasing down space leaks. Haskell makes a trade-off here; reasoning about denotational semantics is much easier than in most other languages because of purity, but non-strict evaluation makes reasoning about operational semantics a little bit harder. In domains where you care a lot about operational semantics (like parallel and concurrent programming, where it's absolutely critical), programmers necessarily require a lot more experience and knowledge in order to be effective in Haskell. G -- Gregory Collins

Reply

Sign in to reply online Use email software

Johan Tibell

9:38 p.m.

New subject: strict version of Haskell - does it exist?

On Tue, Jan 31, 2012 at 1:22 PM, Gregory Collins wrote:

I completely agree on the first part, but deepseq is not a panacea either. It's a big hammer and overuse can sometimes cause wasteful O(n) no-op traversals of already-forced data structures. I also definitely wouldn't go so far as to say that you can't do serious parallel development without it!

I agree. The only time I ever use deepseq is in Criterion benchmarks, as it's a convenient way to make sure that the input data is evaluated before the benchmark starts. If you want a data structure to be fully evaluated, evaluate it as it's created, not after the fact.

The only real solution to problems like these is a thorough understanding of Haskell's evaluation order, and how and why call-by-need is different than call-by-value. This is both a pedagogical problem and genuinely hard -- even Haskell experts like the guys at GHC HQ sometimes spend a lot of time chasing down space leaks. Haskell makes a trade-off here; reasoning about denotational semantics is much easier than in most other languages because of purity, but non-strict evaluation makes reasoning about operational semantics a little bit harder.

+1 We can do a much better job at teaching how to reason about performance. A few rules of thumb gets you a long way. I'm (slowly) working on improving the state of affairs here. -- Johan

Reply

Sign in to reply online Use email software

Albert Y. C. Lai

2 Feb 2 Feb

9:31 p.m.

New subject: strict version of Haskell - does it exist?

http://www.vex.net/~trebla/haskell/lazy.xhtml It is half done.

Reply

Sign in to reply online Use email software

Johan Tibell

31 Jan 31 Jan

9:42 p.m.

New subject: strict version of Haskell - does it exist?

On Tue, Jan 31, 2012 at 12:19 PM, Steve Severance wrote:

The webpage data was split out across tens of thousands of files compressed binary. I used enumerator to load these files and select the appropriate columns. This step was performed in parallel using parMap and worked fine once i figured out how to add the appropriate !s.

Even though advertised as parallel programming tools, parMap and other functions that work in parallel over *sequential* access data structures (i.e. linked lists.) We want flat, strict, unpacked data structures to get good performance out of parallel algorithms. DPH, repa, and even vector show the way. -- Johan

Reply

Sign in to reply online Use email software

Ryan Newton

1 Feb 1 Feb

3:24 p.m.

New subject: strict version of Haskell - does it exist?

Even though advertised as parallel programming tools, parMap and other functions that work in parallel over *sequential* access data structures (i.e. linked lists.) We want flat, strict, unpacked data structures to get good performance out of parallel algorithms. DPH, repa, and even vector show the way.

You would think that tree data structures would be good here as well. For example, monad-par includes a definition of an append-based "AList" (like Guy Steele argues for). But alas that turns out to be much harder to get working well. For most algorithms Vectors so often end up better. -Ryan

Reply

Sign in to reply online Use email software

4898

Age (days ago)

4900

Last active (days ago)

Download

8 comments

7 participants

tags

participants (7)

Albert Y. C. Lai
Felipe Almeida Lessa
Gregory Collins
Johan Tibell
Marc Weber
Ryan Newton
Steve Severance