Re: [Haskell-cafe] Haskell-Cafe Digest, Vol 180, Issue 32

I got 4.7s for similar amount of data in 2013.
However I was pretty sure that fully inlined implementation could
potentially go 5x faster.
http://hackage.haskell.org/package/hPDB
Please check xeno XML parser benchmarks for another example.
https://hackage.haskell.org/package/xeno
On Fri, 31 Aug 2018 at 14:41,
Send Haskell-Cafe mailing list submissions to haskell-cafe@haskell.org
To subscribe or unsubscribe via the World Wide Web, visit http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe or, via email, send a message with subject or body 'help' to haskell-cafe-request@haskell.org
You can reach the person managing the list at haskell-cafe-owner@haskell.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Haskell-Cafe digest..." Today's Topics:
1. Re: HDBC packages looking for maintainer (Tobias Dammers) 2. Re: Alternative instance for non-backtracking parsers (Olaf Klinke) 3. Re: Alternative instance for non-backtracking parsers (Bardur Arantsson)
---------- Forwarded message ---------- From: Tobias Dammers
To: haskell-cafe@haskell.org Cc: Bcc: Date: Thu, 30 Aug 2018 15:24:04 +0200 Subject: Re: [Haskell-cafe] HDBC packages looking for maintainer Hi, I'd be interested. I've used HDBC on a few projects, and my yeshql library was originally built with HDBC as the only backend. It would be a terrible shame to see this bitrot.
Cheers,
Tobias (tdammers on github etc.)
Hi all,
I've been the maintainer for some of the HDBC packages for a while now. Sadly, I've mostly neglected them due to lack of time and usage. While
On Mon, Aug 13, 2018 at 12:07:38PM +0200, Erik Hesselink wrote: the
packages mostly work, there are occasional pull requests and updates for new compiler versions.
Because of this I'm looking for someone who wants to take over HDBC and related packages [1]. If you use HDBC and would like to take over maintainership, please let me know and we can get things set up.
Regards,
Erik
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Tobias Dammers - tdammers@gmail.com
---------- Forwarded message ---------- From: Olaf Klinke
To: PY Cc: haskell-cafe Bcc: Date: Thu, 30 Aug 2018 20:21:07 +0200 Subject: Re: [Haskell-cafe] Alternative instance for non-backtracking parsers Hello, Olaf. I have some distrust of elegant solutions (one of them are C.P. libs).
I have a program that parses several CSV files, one of them 50MB in size, and writes its result as HTML. When I started optimizing, the execution time was 144 seconds. Profiling (thanks to Jasper Van der Jeugt for writing profiteur!) revealed that most of the time was spent parsing and postprocessing the 50MB CSV file. Changing the data structure of the postprocessing stage cut down the execution time to 32 seconds, but still the majority is spent on parsing. Then I realized that (StateT String Maybe) is a parser which conveniently has all the class instances one needs, most notably its Alternative instance make it a backtracking parser. After defining a few combinators I was able to swap out my megaparsec parser against the new parser, which slashed execution time in half. Now most of the parsing time is dedicated to transforming text to numbers and dates. I doubt that parsing time can be reduced much further [*]. The new parser was identical to the old parser, only the combinators now come from another module. That is the elegant thing about monadic parser libraries. I will now use the fast parser by default, and if it returns a Nothing, the program will suggest a command line flag that switches to the original megaparsec parser, exactly telling the user where the parse failed and why. I am not sure whether there is another family of parsers that have interfaces so similar that switching from one package to another is as effortless as monadic parsers.
Cheers Olaf
[*] To the parser experts on this list: How much time should a parser take that processes a 50MB, 130000-line text file, extracting 5 values (String, UTCTime, Int, Double) from each line?
---------- Forwarded message ---------- From: Bardur Arantsson
To: haskell-cafe@haskell.org Cc: Bcc: Date: Thu, 30 Aug 2018 21:43:55 +0200 Subject: Re: [Haskell-cafe] Alternative instance for non-backtracking parsers On 30/08/2018 20.21, Olaf Klinke wrote: Hello, Olaf. I have some distrust of elegant solutions (one of them are C.P. libs).
[*] To the parser experts on this list: How much time should a parser take that processes a 50MB, 130000-line text file, extracting 5 values (String, UTCTime, Int, Double) from each line?
Not an expert, but for something as (relatively!) standard as CSV, I'd probably go for a specialized solution like 'cassava', which seems like it does quite well according to https://github.com/haskell-perf/csv
Based purely the lines/second numbers on that page and the number you've given, I'd guesstimate that your parsing could potentially be as fast as (3.185ms / 1000 lines) * 130000 lines = 414.05ms = 0.4 s.
(Of coure that still doesn't account for extracting the Int, Double, etc., but there are also specialized solutions for that which should be pretty hard to beat, see e.g. bytestring-lexing.)
It's also probably a bit less elegant than a generic parsec-like thing, but that's to be expected for a more special-case solution.
Regards,
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
participants (2)
-
Ben Franksen
-
Michal J Gajda