Re: [Haskell-cafe] unicode text libraries

tittoassini:
2009/9/28 Don Stewart
: titto:
Hi,
I am looking for an unicode strings library, I found on hackage:
http://hackage.haskell.org/package/compact-string
http://hackage.haskell.org/package/text
They both look solid and functionally complete so ... I don't know which one to use :-)
As I am sure I am not the first one facing this choice, may I ask which one you preferred and why?
Data.Text
Thanks , but .. why?
Sorry, was on the way out the door. Data.Text has growing use, is well designed, and builds on the pedigree of bytestring and the vector* series of fusion libraries. I trust that code. -- Don

On Mon, Sep 28, 2009 at 6:15 PM, Don Stewart
tittoassini:
2009/9/28 Don Stewart
: titto:
Hi,
I am looking for an unicode strings library, I found on hackage:
http://hackage.haskell.org/package/compact-string
http://hackage.haskell.org/package/text
They both look solid and functionally complete so ... I don't know which one to use :-)
As I am sure I am not the first one facing this choice, may I ask which one you preferred and why?
Data.Text
Thanks , but .. why?
Sorry, was on the way out the door. Data.Text has growing use, is well designed, and builds on the pedigree of bytestring and the vector* series of fusion libraries. I trust that code.
I agree with Don. Also, I don't think that a Unicode type should mention what encoding it uses as it's an implementation detail. -- Johan

On Mon, 2009-09-28 at 18:32 +0200, Johan Tibell wrote:
On Mon, Sep 28, 2009 at 6:15 PM, Don Stewart
wrote: tittoassini:
2009/9/28 Don Stewart
: titto:
Hi,
I am looking for an unicode strings library, I found on hackage:
http://hackage.haskell.org/package/compact-string
http://hackage.haskell.org/package/text
They both look solid and functionally complete so ... I don't know which one to use :-)
As I am sure I am not the first one facing this choice, may I ask which one you preferred and why?
Also, I don't think that a Unicode type should mention what encoding it uses as it's an implementation detail.
I would put it more strongly. The encoding should not be in the API because it makes it harder to compose functionality. While there may be some circumstances where for performance reasons you may want precise control over the internal encoding, that is not appropriate to use in component interfaces. I know we've made this worse recently, but a proliferation of different string types makes it harder to reuse code. An exposed encoding parameter makes that even worse. Duncan

Johan Tibell
I agree with Don. Also, I don't think that a Unicode type should mention what encoding it uses as it's an implementation detail.
Right. I see from the documentation that it uses Word16s (and presumably the utf-16 encoding). Out of curiosity, why was this particular encoding chosen, as opposed to utf-8 or utf-32/ucs-4? Any benchmarks or other information? -k -- If I haven't seen further, it is by standing in the footprints of giants

On Tue, 2009-09-29 at 10:48 +0200, Ketil Malde wrote:
Johan Tibell
writes: I agree with Don. Also, I don't think that a Unicode type should mention what encoding it uses as it's an implementation detail.
Right. I see from the documentation that it uses Word16s (and presumably the utf-16 encoding). Out of curiosity, why was this particular encoding chosen, as opposed to utf-8 or utf-32/ucs-4? Any benchmarks or other information?
Yes, the choice was based on benchmarks. All three (UTF-8,16,32) were implemented and benchmarked. You can read about the details in Tom Harper's MSc thesis. Duncan

On Mon, Sep 28, 2009 at 9:15 AM, Don Stewart
tittoassini:
2009/9/28 Don Stewart
: titto:
Hi,
I am looking for an unicode strings library, I found on hackage:
http://hackage.haskell.org/package/compact-string
http://hackage.haskell.org/package/text
They both look solid and functionally complete so ... I don't know which one to use :-)
As I am sure I am not the first one facing this choice, may I ask which one you preferred and why?
Data.Text
Thanks , but .. why?
Sorry, was on the way out the door. Data.Text has growing use, is well designed, and builds on the pedigree of bytestring and the vector* series of fusion libraries. I trust that code.
-- Don _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
I just have a question out of curiosity - why was the decision made to have Data.Text, uvector, and ByteString all separate data structures, rather than defining the string types in terms of uvector? Alex

On Mon, Sep 28, 2009 at 3:00 PM, Alexander Dunlap < alexander.dunlap@gmail.com> wrote:
I just have a question out of curiosity - why was the decision made to have Data.Text, uvector, and ByteString all separate data structures, rather than defining the string types in terms of uvector?
bytestring predates the other two libraries by several years. The underlying stream type for uvector and text are almost the same, so they could in principle be merged. There's a fair amount of duplication there, but uvector is in some ways more complicated and in others much less thorough than text. Merging them would be a lot of work!

Hi Bryan and others,
On Mon, Sep 28, 2009 at 5:29 PM, Bryan O'Sullivan
bytestring predates the other two libraries by several years. The underlying stream type for uvector and text are almost the same, so they could in principle be merged. There's a fair amount of duplication there, but uvector is in some ways more complicated and in others much less thorough than text. Merging them would be a lot of work!
If I may free-ride on this thread: how should one go about deriving a Data.Binary instance for text? It looks like doing it efficiently would require using some parts of the internal module that are not exposed, am I correct? I've been using "encodeUtf8", but that doesn't feel right. I don't know what to do, hopefully I'm missing something simple. I agree with Duncan, the text API is beautifully designed. Thanks! Paulo
participants (7)
-
Alexander Dunlap
-
Bryan O'Sullivan
-
Don Stewart
-
Duncan Coutts
-
Johan Tibell
-
Ketil Malde
-
Paulo Tanimoto