
On 20 October 2005 09:48, Donald Bruce Stewart wrote:
bulatz:
Hello John,
Thursday, October 20, 2005, 4:12:36 AM, you wrote:
FastString seems to be a misnomer for this library.
what it provides is a fast _byte array_ with a lot of useful operations, but it does not provide strings since it does not enforce character encodings in the type system, which would be vital for a real FastString library. In any case, just had to get that off my chest :)
may be it can be named ByteArray? and then FastString.Latin1, FastString.UTF8... will use its utilities
also, all UTF8 utilities may be placed outside of FastString.UTF8 module, because it can be used for something else
btw, FastString is also a bit misnamed. it's really a CompactString. ghc's FastString was named so, imho, just because they provided hashes for fast string comparision
Well, it's a PackedString really, isn't it? I changed it to FastString after too many people grumbled about colliding with the existing PackedString.
I think when this is ready it should replace Data.PackedString. I don't necessarily mean put it into fptools/libraries/base - we could just remove the existing Data.PackedString from there and your separate package can provide Data.PackedString. That is, unless we decide to use it in GHC in some way, in which case we'll have to pull (a copy of) it into fptools/libraries. Cheers, Simon

Hello Simon, Thursday, October 20, 2005, 1:45:28 PM, you wrote: SM> I think when this is ready it should replace Data.PackedString. SM> I don't necessarily mean put it into fptools/libraries/base - we could SM> just remove the existing Data.PackedString from there and your separate SM> package can provide Data.PackedString. That is, unless we decide to use SM> it in GHC in some way, in which case we'll have to pull (a copy of) it SM> into fptools/libraries. can't this broke hugs/nhc/hbc compatibility? -- Best regards, Bulat mailto:bulatz@HotPOP.com

On Thu, Oct 20, 2005 at 10:45:28AM +0100, Simon Marlow wrote:
I think when this is ready it should replace Data.PackedString.
I don't necessarily mean put it into fptools/libraries/base - we could just remove the existing Data.PackedString from there and your separate package can provide Data.PackedString. That is, unless we decide to use it in GHC in some way, in which case we'll have to pull (a copy of) it into fptools/libraries.
We should make 'PackedString' the UTF8 wrapper though and provide Data.ByteArray as a separate library. If it has string in the name, one should be able to replace strings with it everywhere and expect the right thing to happen as enforced by the type system. that and C's conflation of characters and bytes and Haskell 98s lack of clearing up the issue has been a huge pet peeve of mine. A nice representation is actually pure UTF8 with a number of characters Int# as well as a number of bytes Int#. it not only makes length fast but lets you very easily test of the string is just ASCII if the two numbers are the same and use optimized routines, much better than a flag. the constructors should also verify everything is properly encoded UTF8 so the conversion to string functions can be very fast without worrying about error checking. John -- John Meacham - ⑆repetae.net⑆john⑈

john:
On Thu, Oct 20, 2005 at 10:45:28AM +0100, Simon Marlow wrote:
I think when this is ready it should replace Data.PackedString.
I don't necessarily mean put it into fptools/libraries/base - we could just remove the existing Data.PackedString from there and your separate package can provide Data.PackedString. That is, unless we decide to use it in GHC in some way, in which case we'll have to pull (a copy of) it into fptools/libraries.
We should make 'PackedString' the UTF8 wrapper though and provide Data.ByteArray as a separate library. If it has string in the name, one should be able to replace strings with it everywhere and expect the right thing to happen as enforced by the type system. that and C's conflation of characters and bytes and Haskell 98s lack of clearing up the issue has been a huge pet peeve of mine.
Ok, so a rough structure of the final lib would be: Data Data System | | | ByteArray (?) PackedString Posix | | | UTF8 Latin1 ... MMap (providing mmapFile :: FilePath -> ByteArray) -- Don

Hello Donald, Friday, October 21, 2005, 5:52:32 AM, you wrote: DBS> Ok, so a rough structure of the final lib would be: DBS> Data Data System DBS> | | | DBS> ByteArray (?) PackedString Posix DBS> | | | DBS> UTF8 Latin1 ... MMap (providing mmapFile :: FilePath -> ByteArray) i propose to make separate library with UTF8 packing/unpacking routines. for example, i can someday create library for 0-ended utf8-encoded strings, which will require this UTF8 lib. also, someday we will need UTF16 packing/unpacking library to deal with windows wide-string apis also, how about implementing general "Vector a" type instead of ByteArray? it will be helpful in making 2/4-byte PackedStrings, and moreover - it will be interesting for other applications data Vector a = Vector !Int !(ForeignPtr a) null = ... reverse = ... data PS_UTF8 = PS_UTF8 (Vector Word8) !Int -- as John Meacham suggested type PS_Latin1 = Vector Word8 -- all functions are already implemented! type PS_UCS2 = Vector Word16 -- ditto -- Best regards, Bulat mailto:bulatz@HotPOP.com

John Meacham wrote:
We should make 'PackedString' the UTF8 wrapper though and provide Data.ByteArray as a separate library.
What exactly would an UTF8 wrapper do? Would it have a different interface than a list (cons/nil/null/head/tail) and how would that be implemented? If not, why isn't a list good enough? Udo. -- Did you know that if you took all the economists in the world and lined them up end to end, they'd still point in the wrong direction?

On 2005-10-21, Udo Stenzel
--===============1618538460== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="vGgW1X5XWziG23Ko" Content-Disposition: inline
--vGgW1X5XWziG23Ko Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable
John Meacham wrote:
We should make 'PackedString' the UTF8 wrapper though and provide Data.ByteArray as a separate library.
What exactly would an UTF8 wrapper do? Would it have a different interface than a list (cons/nil/null/head/tail) and how would that be implemented? If not, why isn't a list good enough?
One could grab by-character rather than by-byte. -- Aaron Denney -><-
participants (6)
-
Aaron Denney
-
Bulat Ziganshin
-
dons@cse.unsw.edu.au
-
John Meacham
-
Simon Marlow
-
Udo Stenzel