
Andrew Coppin wrote:
Ketil Malde wrote:
Ketil Malde
writes: data EvidenceCode = IAC | IUG | IFR | NAC | NR | ... deriving Show
Could it be that this derived read instance is somehow very inefficient?
To answer my own question: this is exactly it, ghc derives less than optimal code in this case. Rather than reiterate the case here, I did a quick writeup at http://blog.malde.org/, and would be quite happy about any feedback or suggestions.
I think you'll find the code that GHC derives for a Read instance handles extra whitespace and all kinds of other whatnot that you don't actually need in this specific case. I suspect this is what's up - but I don't have any hard proof for that. ;-)
I wrote three programs: One does data Tag = Orange | Lemon | Lime | Apple | Banana | Pear | Peach deriving Read The other two use get_tag :: String -> Tag get_tag "Orange" = Orange get_tag "Lemon" = Lemon get_tag "Lime" = Lime get_tag "Apple" = Apple get_tag "Banana" = Banana get_tag "Pear" = Pear get_tag "Peach" = Peach get_tag _ = error "not a tag" and get_tag :: String -> Tag get_tag xs = case xs of [] -> bad (x:xs1) -> case x of 'A' -> case xs1 of "pple" -> Apple _ -> bad 'B' -> case xs1 of "anana" -> Banana _ -> bad 'L' -> case xs1 of "emon" -> Lemon "ime" -> Lime _ -> bad 'O' -> case xs1 of "range" -> Orange _ -> bad 'P' -> case xs1 of ('e':'r':xs2) -> case xs2 of "r" -> Pear "ch" -> Peach _ -> bad _ -> bad _ -> bad bad = error "not a tag" I wrote a driver program that reads a file of 1,000,000 tag values. Using the first version (GHC-derived Read instance) it took about 32 seconds to process. Using the second version (just a bunch of strings to match, no cleaverness at all) took approximately 1 second. The 3rd version was so fast I didn't have time to see the window open before it closed again. Note that all of this was using plain ordinary Strings, not ByteString or anything fancy like that. Note also that the actual documentation for the Prelude even says that Read is very slow. [Although it says it's slow for reading large structures, not large numbers of trivial structures.] None of this is really all that surprising; in the general case, a Read instance might have to skip over spaces or parse deeply nested brackets or any number of other things. If you know you don't need to handle all those cases, write your own parser. It shouldn't be hard to come up with something faster. [Altough obviously it's kinda tedious.]