
On 9/10/10 5:18 PM, Bryan O'Sullivan wrote:
On Fri, Sep 10, 2010 at 2:03 PM, wren ng thornton< wren@community.haskell.org> wrote:
Yes, that was my point. I can see uses for (Text->...), ((Text->Bool)->...), and ((Char->Bool)->...) but the middle one ---which seems to be the closest analogue to String and ByteString--- is missing. The first one is posited as a replacement for the middle one, but it is insufficient since it cannot perform disjunctive searches.
I don't think anyone posited it as a replacement for the middle one?
You did, kinda :) More specifically, you proposed (Text->...) as the analogue for ((Char->Bool)->...) and ((Char8->Bool)->...)
We could replace Char->Bool with Text->Bool, but it would be slower (and yes, that matters to me). I don't intend to add it myself, but you're welcome to put together a patch and a set of QuickCheck tests.
Why do we not just have the middle ((Text->Bool)->...) option?
Because you can't do a Boyer-Moore search off it.
I'm fine with the performance argument, I'm just pointing out why I see the API as inconsistent with the String/ByteString APIs. Since the break function for String/ByteString is rather entrenched as being a method for breaking via a single character, a function that uses Boyer--Moore to break on a string (not just strings required by the mismatch between a "character" and a Char) doesn't seem like the analogous function. I think it's much closer to breakSubstring than it is to break. Whether break/breakSubstring or breakBy/break is the better set of names, that's a different bike shed. As for ((Text->Bool)->...) vs ((Char->Bool)->...), I pointed it out because you've mentioned the discrepancy between Char and "characters". In practice, I'd expect that the majority of characters that people wish to break on are indeed Chars, so performance wins out in API design. However, there's no mention of the discrepancy in the documentation, which I think is an oversight. -- Live well, ~wren