ANNOUNCE: vector-bytestring-0.0.0.0

All your ByteString are belong to us... Hello, I'm pleased to announce the beta release of vector-bytestring. This library provides the type ByteString which is defined as a type synonym for a storable Vector of Word8s (from the vector package): type ByteString = Data.Vector.Storable.Vector Word8 It exports the same API as the bytestring package except that the module names are prefixed with: Data.Vector.Storable.ByteString instead of Data.ByteString. The very ambitious goal of this package is that it will eventually replace our beloved bytestring package. By basing this package on vector, we can benefit from all the optimizations (like stream-fusion!) in that library. We will also have just a single library to test, debug and optimize. TEST-SUITE I ported the bytestring test-suite to vector-bytestring. You can run it using: $ cabal configure --enable-tests; cabal build; cabal test All 54800 tests pass! Only one property doesn't hold: prop_show :: ByteString -> Bool prop_show x = show x == show (unpack x) This is because I don't provide a custom Show instance for ByteStrings but use the one from Vector which shows a vector like "fromList [1,2,3]" instead as "\SOH\STX\ETX" like bytestring does. Hopefully this is not a problem in practice. BENCHMARKS I added a criterion based benchmark-suite to vector-bytestring. It consists of over 600 benchmarks that cover almost every function in the library. Also included are benchmarks which benchmark the fusion capabilities of the library. Run it using: $ cabal configure -fbenchmark; cabal build; $ dist/build/bench/bench --help Unfortunately, bytestring still out performs us in lots of benchmarks. I believe the primary cause of this is that most functions are implemented using stream-fusion. This is highly efficient if you use a composition of these functions because they will all fuse into one single efficient loop. However if your program uses only a single function, the stream based implementation is often less efficient than an implementation that works directly on a mutable vector (like most functions in bytestring). So what we want is to use stream-fusion where possible but use mutable vectors when our program doesn't fuse. Fortunately, Roman Leshchinskiy (author of vector) has an idea how to do this: http://trac.haskell.org/vector/ticket/60 Because we don't beat bytestring in all cases yet you should consider this a beta-release and not use it in production code. INSTALLING $ cabal install vector-bytestring API DOCS http://hackage.haskell.org/package/vector-bytestring-0.0.0.0 DEVELOPING $ git clone https://github.com/basvandijk/vector-bytestring Regards, Bas

Am 12.10.2011 16:02, schrieb Bas van Dijk:
All your ByteString are belong to us...
Hello,
I'm pleased to announce the beta release of vector-bytestring. This library provides the type ByteString which is defined as a type synonym for a storable Vector of Word8s (from the vector package):
type ByteString = Data.Vector.Storable.Vector Word8
It exports the same API as the bytestring package except that the module names are prefixed with: Data.Vector.Storable.ByteString instead of Data.ByteString.
The very ambitious goal of this package is that it will eventually replace our beloved bytestring package. By basing this package on vector, we can benefit from all the optimizations (like stream-fusion!) in that library. We will also have just a single library to test, debug and optimize.
TEST-SUITE
I ported the bytestring test-suite to vector-bytestring. You can run it using:
$ cabal configure --enable-tests; cabal build; cabal test
All 54800 tests pass! Only one property doesn't hold:
prop_show :: ByteString -> Bool prop_show x = show x == show (unpack x)
This is because I don't provide a custom Show instance for ByteStrings but use the one from Vector which shows a vector like "fromList [1,2,3]" instead as "\SOH\STX\ETX" like bytestring does. Hopefully this is not a problem in practice.
All derived Show instances for data types using your ByteString will be different, too. Would it not be simple to use a newtype for ByteString (rather than a synonym)? Cheers Christian

On 14 October 2011 12:58, Christian Maeder
Would it not be simple to use a newtype for ByteString (rather than a synonym)?
My "vision" for the future of bytestring and vector-bytestring is that they will be replaced by vector directly. This way users don't have to think about choosing between bytestring and vector and can go to vector directly to work with Word8 vectors or interface with foreign libraries. This would mean moving some (bytestring only) functions from vector-bytestring to vector like mapAccumL/R, create, createAndTrim, etc. and generalizing them from Word8s to any Storable. So in my vision there's no ByteString type anymore, just Vectors. The vector-bytestring package is meant to make the transition smoother. If there's need for a specific Show instance for Vectors of Word8s we can always add one directly to vector. (Roman, what are your thoughts on this?) Bas

On 14 October 2011 22:37, Bas van Dijk
On 14 October 2011 12:58, Christian Maeder
wrote: Would it not be simple to use a newtype for ByteString (rather than a synonym)?
If there's need for a specific Show instance for Vectors of Word8s we can always add one directly to vector. (Roman, what are your thoughts on this?)
I suppose you could add a newtype wrapper, but it _would_ require duplicating the API to do so. Though I would argue that unless you're trying to actually use for Show/Read for serialisation, does it really matter what the Show/Read instances for Bytestring are? -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Fri, Oct 14, 2011 at 13:45, Ivan Lazar Miljenovic
Though I would argue that unless you're trying to actually use for Show/Read for serialisation, does it really matter what the Show/Read instances for Bytestring are?
Convenient debugging and REPL interaction certainly matter! --Max

Max Rabkin
Though I would argue that unless you're trying to actually use for Show/Read for serialisation, does it really matter what the Show/Read instances for Bytestring are?
Convenient debugging and REPL interaction certainly matter!
On the other hand, having a separate Show instance for Vector Word8 would require either writing all Show instances explicitly or keeping two separate packages. I would prefer to have the two packages merged into one. But since I find a useful Show instance for ByteString useful, too, I would go with the first variant of providing a few default instances instead of a generic Show a => Show (Vector a) instance. That way you can write nicer instances for some other element types, too. For example I can imagine how a much nicer Vector Bool instance would look like: fromBoolString "1.1..111..1" Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

On 14 October 2011 13:37, Bas van Dijk
If there's need for a specific Show instance for Vectors of Word8s we can always add one directly to vector. (Roman, what are your thoughts on this?)
Ok I have proposed and implemented this for vector: http://trac.haskell.org/vector/ticket/64 Bas

On 14/10/2011, at 12:37, Bas van Dijk wrote:
If there's need for a specific Show instance for Vectors of Word8s we can always add one directly to vector. (Roman, what are your thoughts on this?)
Personally, I think that ByteString and especially Vector Word8 aren't strings and shouldn't be treated as such. But I wouldn't be strongly against showing them as strings. However, I *am* strongly against using UndecidableInstances in vector and I don't see how to implement this without using them. Roman

On 15/10/2011, at 12:26, Roman Leshchinskiy wrote:
On 14/10/2011, at 12:37, Bas van Dijk wrote:
If there's need for a specific Show instance for Vectors of Word8s we can always add one directly to vector. (Roman, what are your thoughts on this?)
Personally, I think that ByteString and especially Vector Word8 aren't strings and shouldn't be treated as such. But I wouldn't be strongly against showing them as strings. However, I *am* strongly against using UndecidableInstances in vector and I don't see how to implement this without using them.
I meant OverlappingInstances, of course. To clarify, I would still consider it if everybody thinks it's a really good idea. Roman

On 15 October 2011 13:34, Roman Leshchinskiy
On 15/10/2011, at 12:26, Roman Leshchinskiy wrote:
On 14/10/2011, at 12:37, Bas van Dijk wrote:
If there's need for a specific Show instance for Vectors of Word8s we can always add one directly to vector. (Roman, what are your thoughts on this?)
Personally, I think that ByteString and especially Vector Word8 aren't strings and shouldn't be treated as such. But I wouldn't be strongly against showing them as strings. However, I *am* strongly against using UndecidableInstances in vector and I don't see how to implement this without using them.
I meant OverlappingInstances, of course. To clarify, I would still consider it if everybody thinks it's a really good idea.
Roman
import Network.HTTP.Enumerator simpleHttp "http://code.haskell.org/~basvandijk/" Chunk "<html>\n<head><title>Bas van Dijk</title></head>\n<body>\n<h1>Bas van Dijk</h1>\n\n<p>Email:
I agree that you shouldn't use ByteStrings or Vectors of Word8s for
Unicode strings. However I can imagine that for quick sessions in ghci
it can be quite handy if they are shown as strings. For example,
currently we have:
href=\"mailto://v.dijk.bas@gmail.com\">v.dijk.bas@gmail.com</a></p>\n\n<p>Nick
on IRC: <tt>basvandijk</tt></p>\n\nhttp://www.haskellers.com/user/basvandijk/\">\n http://www.haskellers.com/static/badge.png\" \n alt=\"I'm
a Haskeller\"\n border=\"0\">\n</a>\n\n<p>See my https://github.com/basvandijk\">GitHub</a> page for a list of
projects I work on.</p>\n\n</body>\n</html>\n" Empty
If ByteStrings were not shown as strings this would look like:
Chunk ( fromList
[60,104,116,109,108,62,10,60,104,101,97,100,62,60,116,105,116,108,101,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,116,105,116,108,101,62,60,47,104,101,97,100,62,10,60,98,111,100,121,62,10,60,104,49,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,104,49,62,10,10,60,112,62,69,109,97,105,108,58,32,60,97,32,104,114,101,102,61,34,109,97,105,108,116,111,58,47,47,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,34,62,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,60,47,97,62,60,47,112,62,10,10,60,112,62,78,105,99,107,32,111,110,32,73,82,67,58,32,60,116,116,62,98,97,115,118,97,110,100,105,106,107,60,47,116,116,62,60,47,112,62,10,10,60,97,32,104,114,101,102,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,117,115,101,114,47,98,97,115,118,97,110,100,105,106,107,47,34,62,10,32,32,60,105,109,103,32,115,114,99,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,115,116,97,116,105,99,47,98,97,100,103,101,46,112,110,103,34,32,10,32,32,32,32,32,32,32,97,108,116,61,34,73,39,109,32,97,32,72,97,115,107,101,108,108,101,114,34,10,32,32,32,32,32,32,32,98,111,114,100,101,114,61,34,48,34,62,10,60,47,97,62,10,10,60,112,62,83,101,101,32,109,121,32,60,97,32,104,114,101,102,61,34,104,116,116,112,115,58,47,47,103,105,116,104,117,98,46,99,111,109,47,98,97,115,118,97,110,100,105,106,107,34,62,71,105,116,72,117,98,60,47,97,62,32,112,97,103,101,32,102,111,114,32,97,32,108,105,115,116,32,111,102,32,112,114,111,106,101,99,116,115,32,73,32,119,111,114,107,32,111,110,46,60,47,112,62,10,10,60,47,98,111,100,121,62,10,60,47,104,116,109,108,62,10])
Empty
Personally, I don't work in ghci that often so I don't care that much
if we have or don't have specialized Show instances for Vectors of
Word8s.
So what do other people think about this?
Bas

Hi, Am Samstag, den 15.10.2011, 16:15 +0200 schrieb Bas van Dijk:
So what do other people think about this?
having a human-readable Show instance for ByteStrings is definitely a great plus when debugging code. Greetings, Joachim -- Joachim "nomeata" Breitner mail@joachim-breitner.de | nomeata@debian.org | GPG: 0x4743206C xmpp: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/

Joachim Breitner
So what do other people think about this?
having a human-readable Show instance for ByteStrings is definitely a great plus when debugging code.
I agree and would even go as far as saying that it's generally useful, even if the data is not guaranteed to be text. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

On 16 October 2011 01:15, Bas van Dijk
I agree that you shouldn't use ByteStrings or Vectors of Word8s for Unicode strings. However I can imagine that for quick sessions in ghci it can be quite handy if they are shown as strings. For example, currently we have:
import Network.HTTP.Enumerator simpleHttp "http://code.haskell.org/~basvandijk/" Chunk "<html>\n<head><title>Bas van Dijk</title></head>\n<body>\n<h1>Bas van Dijk</h1>\n\n<p>Email: v.dijk.bas@gmail.com</a></p>\n\n<p>Nick on IRC: <tt>basvandijk</tt></p>\n\nhttp://www.haskellers.com/user/basvandijk/\">\n
http://www.haskellers.com/static/badge.png\" \n alt=\"I'm a Haskeller\"\n border=\"0\">\n</a>\n\n<p>See my https://github.com/basvandijk\">GitHub</a> page for a list of projects I work on.</p>\n\n</body>\n</html>\n" Empty
If ByteStrings were not shown as strings this would look like:
Chunk ( fromList [60,104,116,109,108,62,10,60,104,101,97,100,62,60,116,105,116,108,101,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,116,105,116,108,101,62,60,47,104,101,97,100,62,10,60,98,111,100,121,62,10,60,104,49,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,104,49,62,10,10,60,112,62,69,109,97,105,108,58,32,60,97,32,104,114,101,102,61,34,109,97,105,108,116,111,58,47,47,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,34,62,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,60,47,97,62,60,47,112,62,10,10,60,112,62,78,105,99,107,32,111,110,32,73,82,67,58,32,60,116,116,62,98,97,115,118,97,110,100,105,106,107,60,47,116,116,62,60,47,112,62,10,10,60,97,32,104,114,101,102,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,117,115,101,114,47,98,97,115,118,97,110,100,105,106,107,47,34,62,10,32,32,60,105,109,103,32,115,114,99,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,115,116,97,116,105,99,47,98,97,100,103,101,46,112,110,103,34,32,10,32,32,32,32,32,32,32,97,108,116,61,34,73,39,109,32,97,32,72,97,115,107,101,108,108,101,114,34,10,32,32,32,32,32,32,32,98,111,114,100,101,114,61,34,48,34,62,10,60,47,97,62,10,10,60,112,62,83,101,101,32,109,121,32,60,97,32,104,114,101,102,61,34,104,116,116,112,115,58,47,47,103,105,116,104,117,98,46,99,111,109,47,98,97,115,118,97,110,100,105,106,107,34,62,71,105,116,72,117,98,60,47,97,62,32,112,97,103,101,32,102,111,114,32,97,32,108,105,115,116,32,111,102,32,112,114,111,106,101,99,116,115,32,73,32,119,111,114,107,32,111,110,46,60,47,112,62,10,10,60,47,98,111,100,121,62,10,60,47,104,116,109,108,62,10]) Empty
Personally, I don't work in ghci that often so I don't care that much if we have or don't have specialized Show instances for Vectors of Word8s.
So what do other people think about this?
Actually, for my current use case of Bytestrings (binary encoding of graphs using existing encoding schemes), I would prefer this [Word8]-based Show instance as it would help with my debugging, since the output looks along the lines of: Chunk (fromList [3,2,3,0,3,1,3,0,2,2,1,0]). I am the first to admit that my use case is probably different from others though. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On 15 October 2011 23:18, Ivan Lazar Miljenovic
On 16 October 2011 01:15, Bas van Dijk
wrote: I agree that you shouldn't use ByteStrings or Vectors of Word8s for Unicode strings. However I can imagine that for quick sessions in ghci it can be quite handy if they are shown as strings. For example, currently we have:
import Network.HTTP.Enumerator simpleHttp "http://code.haskell.org/~basvandijk/" Chunk "<html>\n<head><title>Bas van Dijk</title></head>\n<body>\n<h1>Bas van Dijk</h1>\n\n<p>Email: v.dijk.bas@gmail.com</a></p>\n\n<p>Nick on IRC: <tt>basvandijk</tt></p>\n\nhttp://www.haskellers.com/user/basvandijk/\">\n
http://www.haskellers.com/static/badge.png\" \n alt=\"I'm a Haskeller\"\n border=\"0\">\n</a>\n\n<p>See my https://github.com/basvandijk\">GitHub</a> page for a list of projects I work on.</p>\n\n</body>\n</html>\n" Empty
If ByteStrings were not shown as strings this would look like:
Chunk ( fromList [60,104,116,109,108,62,10,60,104,101,97,100,62,60,116,105,116,108,101,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,116,105,116,108,101,62,60,47,104,101,97,100,62,10,60,98,111,100,121,62,10,60,104,49,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,104,49,62,10,10,60,112,62,69,109,97,105,108,58,32,60,97,32,104,114,101,102,61,34,109,97,105,108,116,111,58,47,47,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,34,62,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,60,47,97,62,60,47,112,62,10,10,60,112,62,78,105,99,107,32,111,110,32,73,82,67,58,32,60,116,116,62,98,97,115,118,97,110,100,105,106,107,60,47,116,116,62,60,47,112,62,10,10,60,97,32,104,114,101,102,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,117,115,101,114,47,98,97,115,118,97,110,100,105,106,107,47,34,62,10,32,32,60,105,109,103,32,115,114,99,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,115,116,97,116,105,99,47,98,97,100,103,101,46,112,110,103,34,32,10,32,32,32,32,32,32,32,97,108,116,61,34,73,39,109,32,97,32,72,97,115,107,101,108,108,101,114,34,10,32,32,32,32,32,32,32,98,111,114,100,101,114,61,34,48,34,62,10,60,47,97,62,10,10,60,112,62,83,101,101,32,109,121,32,60,97,32,104,114,101,102,61,34,104,116,116,112,115,58,47,47,103,105,116,104,117,98,46,99,111,109,47,98,97,115,118,97,110,100,105,106,107,34,62,71,105,116,72,117,98,60,47,97,62,32,112,97,103,101,32,102,111,114,32,97,32,108,105,115,116,32,111,102,32,112,114,111,106,101,99,116,115,32,73,32,119,111,114,107,32,111,110,46,60,47,112,62,10,10,60,47,98,111,100,121,62,10,60,47,104,116,109,108,62,10]) Empty
Personally, I don't work in ghci that often so I don't care that much if we have or don't have specialized Show instances for Vectors of Word8s.
So what do other people think about this?
Actually, for my current use case of Bytestrings (binary encoding of graphs using existing encoding schemes), I would prefer this [Word8]-based Show instance as it would help with my debugging, since the output looks along the lines of: Chunk (fromList [3,2,3,0,3,1,3,0,2,2,1,0]). I am the first to admit that my use case is probably different from others though.
And I often work with mixed text/binary data (eg. text annotations in video streams). I'd want the Show/Read instances to be in the form of a hexdump with char representation alongside (like xxd or od -xc output). It roundtrips well, so why not? :-) Conrad.

On Mon, Oct 17, 2011 at 10:30 PM, Conrad Parker
And I often work with mixed text/binary data (eg. text annotations in video streams). I'd want the Show/Read instances to be in the form of a hexdump with char representation alongside (like xxd or od -xc output). It roundtrips well, so why not? :-)
Interesting idea. I quite like it! -- Felipe.

On 10/18/2011 01:30 AM, Conrad Parker wrote:
And I often work with mixed text/binary data (eg. text annotations in video streams). I'd want the Show/Read instances to be in the form of a hexdump with char representation alongside (like xxd or od -xc output). It roundtrips well, so why not? :-) (slightly out of topic ...)
I often do mixed text/binary too, and i now use the following package: http://hackage.haskell.org/package/bytedump The problem with a Show instance is that there's no way to configure some aspects of it :-) -- Vincent

Conrad Parker wrote:
On 15 October 2011 23:18, Ivan Lazar Miljenovic
wrote: On 16 October 2011 01:15, Bas van Dijk
wrote: I agree that you shouldn't use ByteStrings or Vectors of Word8s for Unicode strings. However I can imagine that for quick sessions in ghci it can be quite handy if they are shown as strings. For example, currently we have:
import Network.HTTP.Enumerator simpleHttp "http://code.haskell.org/~basvandijk/" Chunk "<html>\n<head><title>Bas van Dijk</title></head>\n<body>\n<h1>Bas van Dijk</h1>\n\n<p>Email: v.dijk.bas@gmail.com</a></p>\n\n<p>Nick on IRC: <tt>basvandijk</tt></p>\n\nhttp://www.haskellers.com/user/basvandijk/\">\n
http://www.haskellers.com/static/badge.png\" \n alt=\"I'm a Haskeller\"\n border=\"0\">\n</a>\n\n<p>See my https://github.com/basvandijk\">GitHub</a> page for a list of projects I work on.</p>\n\n</body>\n</html>\n" Empty
If ByteStrings were not shown as strings this would look like:
Chunk ( fromList [60,104,116,109,108,62,10,60,104,101,97,100,62,60,116,105,116,108,101,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,116,105,116,108,101,62,60,47,104,101,97,100,62,10,60,98,111,100,121,62,10,60,104,49,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,104,49,62,10,10,60,112,62,69,109,97,105,108,58,32,60,97,32,104,114,101,102,61,34,109,97,105,108,116,111,58,47,47,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,34,62,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,60,47,97,62,60,47,112,62,10,10,60,112,62,78,105,99,107,32,111,110,32,73,82,67,58,32,60,116,116,62,98,97,115,118,97,110,100,105,106,107,60,47,116,116,62,60,47,112,62,10,10,60,97,32,104,114,101,102,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,117,115,101,114,47,98,97,115,118,97,110,100,105,106,107,47,34,62,10,32,32,60,105,109,103,32,115,114,99,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,115,116,97,116,105,99,47,98,97,100,103,101,46,112,110,103,34,32,10,32,32,32,32,32,32,32,97,108,116,61,34,73,39,109,32,97,32,72,97,115,107,101,108,108,101,114,34,10,32,32,32,32,32,32,32,98,111,114,100,101,114,61,34,48,34,62,10,60,47,97,62,10,10,60,112,62,83,101,101,32,109,121,32,60,97,32,104,114,101,102,61,34,104,116,116,112,115,58,47,47,103,105,116,104,117,98,46,99,111,109,47,98,97,115,118,97,110,100,105,106,107,34,62,71,105,116,72,117,98,60,47,97,62,32,112,97,103,101,32,102,111,114,32,97,32,108,105,115,116,32,111,102,32,112,114,111,106,101,99,116,115,32,73,32,119,111,114,107,32,111,110,46,60,47,112,62,10,10,60,47,98,111,100,121,62,10,60,47,104,116,109,108,62,10]) Empty
Personally, I don't work in ghci that often so I don't care that much if we have or don't have specialized Show instances for Vectors of Word8s.
So what do other people think about this?
Actually, for my current use case of Bytestrings (binary encoding of graphs using existing encoding schemes), I would prefer this [Word8]-based Show instance as it would help with my debugging, since the output looks along the lines of: Chunk (fromList [3,2,3,0,3,1,3,0,2,2,1,0]). I am the first to admit that my use case is probably different from others though.
And I often work with mixed text/binary data (eg. text annotations in video streams). I'd want the Show/Read instances to be in the form of a hexdump with char representation alongside (like xxd or od -xc output). It roundtrips well, so why not? :-)
So it seems that (1) people have very different requirements and (2) the Show instance only really matters for debugging in ghci. Here is a thought. What if ghci allowed Show instances to be overridden dynamically? So you could put your preferred Show instance for Vector Word8 in you .ghci file and ghci would use that when displaying stuff (but not when actually evaluating things). Would that solve most of the problems without messing with vector's Show instances? Roman

On 19 October 2011 22:09, Roman Leshchinskiy
So it seems that (1) people have very different requirements and (2) the Show instance only really matters for debugging in ghci. Here is a thought. What if ghci allowed Show instances to be overridden dynamically? So you could put your preferred Show instance for Vector Word8 in you .ghci file and ghci would use that when displaying stuff (but not when actually evaluating things). Would that solve most of the problems without messing with vector's Show instances?
Would this hypothetical ghci feature also work for cases where you have a ByteString as part of another type that derives Show and Read? I also wonder whether it would suffice to have a ByteString -> String function available rather than requiring Show per-se for the case of a ByteString on its lonesome. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Oct 19, 2011, at 7:32 AM, Ivan Lazar Miljenovic wrote:
On 19 October 2011 22:09, Roman Leshchinskiy
wrote: So it seems that (1) people have very different requirements and (2) the Show instance only really matters for debugging in ghci. Here is a thought. What if ghci allowed Show instances to be overridden dynamically? So you could put your preferred Show instance for Vector Word8 in you .ghci file and ghci would use that when displaying stuff (but not when actually evaluating things). Would that solve most of the problems without messing with vector's Show instances?
Would this hypothetical ghci feature also work for cases where you have a ByteString as part of another type that derives Show and Read?
I also wonder whether it would suffice to have a ByteString -> String function available rather than requiring Show per-se for the case of a ByteString on its lonesome.
Note that other programming languages have had to solve this exact problem and they usually end up with multiple functions- one for debugging, one for serialization, one for displaying how the object was constructed. For example, in Python, look at: http://docs.python.org/reference/datamodel.html#object.__repr__ http://docs.python.org/reference/datamodel.html#object.__str__ Cheers, M

On 19 October 2011 15:59, AM
Note that other programming languages have had to solve this exact problem and they usually end up with multiple functions- one for debugging, one for serialization, one for displaying how the object was constructed.
As per Haskell with Show (representation) and Data.Binary (serialization), of course.

On Wed, Oct 19, 2011 at 9:49 AM, Stephen Tetley
On 19 October 2011 15:59, AM
wrote: Note that other programming languages have had to solve this exact problem and they usually end up with multiple functions- one for debugging, one for serialization, one for displaying how the object was constructed.
As per Haskell with Show (representation) and Data.Binary (serialization), of course.
That's different, python's str() is meant to be human readable and not necessarily parseable, I have a haskell equivalent in a Pretty class. repr() is like haskell's show, it's meant to also be human readable but also be parseable to get the original value back. The problem is that you can write deriving for Show but not for Pretty, so as soon as you want to pretty a record you're back to writing stuff by hand, even if it's just to fix up one field (say it's a function or a huge table that you want to abbreviate to 'HugeTable: 73246 entries'). Perhaps the new generic deriving stuff could fix that, I'd like to see 'deriving (Show)' implemented in haskell so I could write my own variations. Data.Binary would be what python calls pickle.

Ivan Lazar Miljenovic wrote:
On 19 October 2011 22:09, Roman Leshchinskiy
wrote: So it seems that (1) people have very different requirements and (2) the Show instance only really matters for debugging in ghci. Here is a thought. What if ghci allowed Show instances to be overridden dynamically? So you could put your preferred Show instance for Vector Word8 in you .ghci file and ghci would use that when displaying stuff (but not when actually evaluating things). Would that solve most of the problems without messing with vector's Show instances?
Would this hypothetical ghci feature also work for cases where you have a ByteString as part of another type that derives Show and Read?
Yes. The idea would be to evaluate the expression, then build the Show instance for the type of the result taking the ghci overrides into account and then use that to display the result. I have to admit that I have no idea how difficult it would be to do this but surely it can't be that hard. Roman

On Wed, Oct 19, 2011 at 1:09 PM, Roman Leshchinskiy
So it seems that (1) people have very different requirements and (2) the Show instance only really matters for debugging in ghci. Here is a thought. What if ghci allowed Show instances to be overridden dynamically? So you could put your preferred Show instance for Vector Word8 in you .ghci file and ghci would use that when displaying stuff (but not when actually evaluating things). Would that solve most of the problems without messing with vector's Show instances?
I actually think it's more than just GHCi. A lot of the time when debugging some code, I'll litter it with "print"s to see what's going on. For me, I'd rather change the actual Show instance. It might make sense to try and pursue something like what you're suggesting, but I think the default Show (Vector Word8) should be the one most useful, most of the time, and I think the general consensus seems to be the current ByteString instance fits that role. Michael

Michael Snoyman
sense to try and pursue something like what you're suggesting, but I think the default Show (Vector Word8) should be the one most useful, most of the time, and I think the general consensus seems to be the current ByteString instance fits that role.
Hm. I think it is slightly weird to display a numeric value (Word8) as a Char. Also, I would prefer a representation making the type explicit (but unlike ByteString, vector seems to add a type annotation.) Would you still support the truncating behavior for 'read' and values above 255? (ByteString has two interfaces, ByteString and .Char8, but as there can be only one Show instance, I see why it works the way it does.) -k -- If I haven't seen further, it is by standing in the footprints of giants

On Wed, Oct 19, 2011 at 9:29 PM, Ketil Malde
Michael Snoyman
writes: sense to try and pursue something like what you're suggesting, but I think the default Show (Vector Word8) should be the one most useful, most of the time, and I think the general consensus seems to be the current ByteString instance fits that role.
Hm. I think it is slightly weird to display a numeric value (Word8) as a Char. Also, I would prefer a representation making the type explicit (but unlike ByteString, vector seems to add a type annotation.) Would you still support the truncating behavior for 'read' and values above 255?
(ByteString has two interfaces, ByteString and .Char8, but as there can be only one Show instance, I see why it works the way it does.)
Perhaps the correct semantic approach would be to have: newtype Char8 = Char8 Word8 But I think that will break far too many applications to try to get it implemented. In an ideal world, I agree with both points: displaying a numeric value as a Char doesn't make sense, and there are definitely issues with the Read instance. However, I still think current behavior is the least of all available evils. Show/Read work properly as a pair and can encode/decode any ByteString, and there's never any presumption that all input to read is valid. Michael

Am 20.10.2011 21:43, schrieb Michael Snoyman:
On Wed, Oct 19, 2011 at 9:29 PM, Ketil Malde
wrote: Michael Snoyman
writes: sense to try and pursue something like what you're suggesting, but I think the default Show (Vector Word8) should be the one most useful, most of the time, and I think the general consensus seems to be the current ByteString instance fits that role.
Hm. I think it is slightly weird to display a numeric value (Word8) as a Char. Also, I would prefer a representation making the type explicit (but unlike ByteString, vector seems to add a type annotation.) Would you still support the truncating behavior for 'read' and values above 255?
(ByteString has two interfaces, ByteString and .Char8, but as there can be only one Show instance, I see why it works the way it does.)
Perhaps the correct semantic approach would be to have:
newtype Char8 = Char8 Word8
But I think that will break far too many applications to try to get it
would a new Word8 type be better to stay compatible? newtype Word8 = C8 Data.Word.Word8 C.
implemented. In an ideal world, I agree with both points: displaying a numeric value as a Char doesn't make sense, and there are definitely issues with the Read instance. However, I still think current behavior is the least of all available evils. Show/Read work properly as a pair and can encode/decode any ByteString, and there's never any presumption that all input to read is valid.
Michael

On Fri, Oct 21, 2011 at 11:58 AM, Christian Maeder
Am 20.10.2011 21:43, schrieb Michael Snoyman:
On Wed, Oct 19, 2011 at 9:29 PM, Ketil Malde
wrote: Michael Snoyman
writes: sense to try and pursue something like what you're suggesting, but I think the default Show (Vector Word8) should be the one most useful, most of the time, and I think the general consensus seems to be the current ByteString instance fits that role.
Hm. I think it is slightly weird to display a numeric value (Word8) as a Char. Also, I would prefer a representation making the type explicit (but unlike ByteString, vector seems to add a type annotation.) Would you still support the truncating behavior for 'read' and values above 255?
(ByteString has two interfaces, ByteString and .Char8, but as there can be only one Show instance, I see why it works the way it does.)
Perhaps the correct semantic approach would be to have:
newtype Char8 = Char8 Word8
But I think that will break far too many applications to try to get it
would a new Word8 type be better to stay compatible?
newtype Word8 = C8 Data.Word.Word8
I don't think it would really fix much. Any code in the wild right now that refers to Word8 will be referring to Data.Word.Word8. Certainly calling the newtype Word8 will slightly simplify a migration, but (1) it will still require code changes and (2) I'd rather just bite the bullet and make a proper switch. Michael

I think the cleanest solution (just from a theoretical point of view) is to use a newtype for your byte strings. - it should have the same performance - allows to make ByteString really abstract when hiding the newtype constructor - is portable and supplies control over all other instances (not just Show) I'm not sure if one could make really bad thinks to your ByteString by using the Vector interface, but one would want to disallow vector operations just for compatible with other byte strings. It would require more work just on your side, though. Cheers Christian Am 15.10.2011 16:15, schrieb Bas van Dijk:
On 15 October 2011 13:34, Roman Leshchinskiy
wrote: On 15/10/2011, at 12:26, Roman Leshchinskiy wrote:
On 14/10/2011, at 12:37, Bas van Dijk wrote:
If there's need for a specific Show instance for Vectors of Word8s we can always add one directly to vector. (Roman, what are your thoughts on this?)
Personally, I think that ByteString and especially Vector Word8 aren't strings and shouldn't be treated as such. But I wouldn't be strongly against showing them as strings. However, I *am* strongly against using UndecidableInstances in vector and I don't see how to implement this without using them.
I meant OverlappingInstances, of course. To clarify, I would still consider it if everybody thinks it's a really good idea.
Roman
I agree that you shouldn't use ByteStrings or Vectors of Word8s for Unicode strings. However I can imagine that for quick sessions in ghci it can be quite handy if they are shown as strings. For example, currently we have:
import Network.HTTP.Enumerator simpleHttp "http://code.haskell.org/~basvandijk/" Chunk "<html>\n<head><title>Bas van Dijk</title></head>\n<body>\n<h1>Bas van Dijk</h1>\n\n<p>Email:v.dijk.bas@gmail.com</a></p>\n\n<p>Nick on IRC:<tt>basvandijk</tt></p>\n\nhttp://www.haskellers.com/user/basvandijk/\">\n
http://www.haskellers.com/static/badge.png\" \n alt=\"I'm a Haskeller\"\n border=\"0\">\n</a>\n\n<p>See myhttps://github.com/basvandijk\">GitHub</a> page for a list of projects I work on.</p>\n\n</body>\n</html>\n" Empty
If ByteStrings were not shown as strings this would look like:
Chunk ( fromList [60,104,116,109,108,62,10,60,104,101,97,100,62,60,116,105,116,108,101,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,116,105,116,108,101,62,60,47,104,101,97,100,62,10,60,98,111,100,121,62,10,60,104,49,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,104,49,62,10,10,60,112,62,69,109,97,105,108,58,32,60,97,32,104,114,101,102,61,34,109,97,105,108,116,111,58,47,47,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,34,62,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,60,47,97,62,60,47,112,62,10,10,60,112,62,78,105,99,107,32,111,110,32,73,82,67,58,32,60,116,116,62,98,97,115,118,97,110,100,105,106,107,60,47,116,116,62,60,47,112,62,10,10,60,97,32,104,114,101,102,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,117,115,101,114,47,98,97,115,118,97,110,100,105,106,107,47,34,62,10,32,32,60,105,109,103,32,115,114,99,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108 ,108,101,114,115,46,99,111,109,47,115,116,97,116,105,99,47,98,97,100,103,101,46,112,110,103,34,32,10,32,32,32,32,32,32,32,97,108,116,61,34,73,39,109,32,97,32,72,97,115,107,101,108,108,101,114,34,10,32,32,32,32,32,32,32,98,111,114,100,101,114,61,34,48,34,62,10,60,47,97,62,10,10,60,112,62,83,101,101,32,109,121,32,60,97,32,104,114,101,102,61,34,104,116,116,112,115,58,47,47,103,105,116,104,117,98,46,99,111,109,47,98,97,115,118,97,110,100,105,106,107,34,62,71,105,116,72,117,98,60,47,97,62,32,112,97,103,101,32,102,111,114,32,97,32,108,105,115,116,32,111,102,32,112,114,111,106,101,99,116,115,32,73,32,119,111,114,107,32,111,110,46,60,47,112,62,10,10,60,47,98,111,100,121,62,10,60,47,104,116,109,108,62,10]) Empty
Personally, I don't work in ghci that often so I don't care that much if we have or don't have specialized Show instances for Vectors of Word8s.
So what do other people think about this?
Bas

And you could just use GeneralizedNewtypeDeriving extension. Then you could
use functions from Data.Vector.Generic on your ByteStrings.
Much cleaner IMO than OverlappingInstances.
2011/10/17 Christian Maeder
I think the cleanest solution (just from a theoretical point of view) is to use a newtype for your byte strings.
- it should have the same performance - allows to make ByteString really abstract when hiding the newtype constructor - is portable and supplies control over all other instances (not just Show)
I'm not sure if one could make really bad thinks to your ByteString by using the Vector interface, but one would want to disallow vector operations just for compatible with other byte strings.
It would require more work just on your side, though.
Cheers Christian
Am 15.10.2011 16:15, schrieb Bas van Dijk:
On 15 October 2011 13:34, Roman Leshchinskiy
> wrote:
On 15/10/2011, at 12:26, Roman Leshchinskiy wrote:
On 14/10/2011, at 12:37, Bas van Dijk wrote:
If there's need for a specific Show instance for Vectors of Word8s we
can always add one directly to vector. (Roman, what are your thoughts on this?)
Personally, I think that ByteString and especially Vector Word8 aren't strings and shouldn't be treated as such. But I wouldn't be strongly against showing them as strings. However, I *am* strongly against using UndecidableInstances in vector and I don't see how to implement this without using them.
I meant OverlappingInstances, of course. To clarify, I would still consider it if everybody thinks it's a really good idea.
Roman
I agree that you shouldn't use ByteStrings or Vectors of Word8s for Unicode strings. However I can imagine that for quick sessions in ghci it can be quite handy if they are shown as strings. For example, currently we have:
import Network.HTTP.Enumerator
simpleHttp "http://code.haskell.org/~**basvandijk/http://code.haskell.org/%7Ebasvandijk/ "
Chunk "<html>\n<head><title>Bas van Dijk</title></head>\n<body>\n<**h1>Bas van Dijk</h1>\n\n<p>Email:\"> v.dijk.bas@gmail.**com
</a></p>\n\n<p>Nick on IRC:<tt>basvandijk</tt></p>\n\**nhttp://www.haskellers.**com/user/basvandijk/\http://www.haskellers.com/user/basvandijk/%5C ">\n http://www.haskellers.com/static/badge.png%5C" \n alt=\"I'm a Haskeller\"\n border=\"0\">\n</a>\n\n<p>See myhttps://github.com/**basvandijk\https://github.com/basvandijk%5C">GitHub</a> page for a list of projects I work on.</p>\n\n</body>\n</html>\n" Emptyhttp://www.haskellers.**com/static/badge.png\
If ByteStrings were not shown as strings this would look like:
Chunk ( fromList [60,104,116,109,108,62,10,60,**104,101,97,100,62,60,116,105,** 116,108,101,62,66,97,115,32,**118,97,110,32,68,105,106,107,** 60,47,116,105,116,108,101,62,**60,47,104,101,97,100,62,10,60,** 98,111,100,121,62,10,60,104,**49,62,66,97,115,32,118,97,110,** 32,68,105,106,107,60,47,104,**49,62,10,10,60,112,62,69,109,** 97,105,108,58,32,60,97,32,104,**114,101,102,61,34,109,97,105,** 108,116,111,58,47,47,118,46,**100,105,106,107,46,98,97,115,** 64,103,109,97,105,108,46,99,**111,109,34,62,118,46,100,105,** 106,107,46,98,97,115,64,103,**109,97,105,108,46,99,111,109,** 60,47,97,62,60,47,112,62,10,**10,60,112,62,78,105,99,107,32,** 111,110,32,73,82,67,58,32,60,**116,116,62,98,97,115,118,97,** 110,100,105,106,107,60,47,116,**116,62,60,47,112,62,10,10,60,** 97,32,104,114,101,102,61,34,**104,116,116,112,58,47,47,119,** 119,119,46,104,97,115,107,101,**108,108,101,114,115,46,99,111,** 109,47,117,115,101,114,47,98,**97,115,118,97,110,100,105,106,** 107,47,34,62,10,32,32,60,105,**109,103,32,115,114,99,61,34,** 104,116,116,112,58,47,47,119,**119,119,46,104,97,115,107,101,**108
,108,101,114,115,46,99,111,**109,47,115,116,97,116,105,99,** 47,98,97,100,103,101,46,112,**110,103,34,32,10,32,32,32,32,** 32,32,32,97,108,116,61,34,73,**39,109,32,97,32,72,97,115,107,** 101,108,108,101,114,34,10,32,**32,32,32,32,32,32,98,111,114,** 100,101,114,61,34,48,34,62,10,**60,47,97,62,10,10,60,112,62,** 83,101,101,32,109,121,32,60,**97,32,104,114,101,102,61,34,** 104,116,116,112,115,58,47,47,**103,105,116,104,117,98,46,99,** 111,109,47,98,97,115,118,97,**110,100,105,106,107,34,62,71,** 105,116,72,117,98,60,47,97,62,**32,112,97,103,101,32,102,111,** 114,32,97,32,108,105,115,116,**32,111,102,32,112,114,111,106,** 101,99,116,115,32,73,32,119,**111,114,107,32,111,110,46,60,** 47,112,62,10,10,60,47,98,111,**100,121,62,10,60,47,104,116,** 109,108,62,10])
Empty
Personally, I don't work in ghci that often so I don't care that much if we have or don't have specialized Show instances for Vectors of Word8s.
So what do other people think about this?
Bas
______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe

Christian Maeder
I think the cleanest solution (just from a theoretical point of view) is to use a newtype for your byte strings.
- it should have the same performance - allows to make ByteString really abstract when hiding the newtype constructor - is portable and supplies control over all other instances (not just Show)
I'm not sure if one could make really bad thinks to your ByteString by using the Vector interface, but one would want to disallow vector operations just for compatible with other byte strings.
It would require more work just on your side, though.
Also such an implementation wouldn't be big news. You would get stream fusion as news, but I'm specifically excited about the idea that I can use the vector interface. I see no need to restrict the ByteString interface, since it is a pretty low level data structure anyway. You use it to process raw ByteStrings and as such should get maximum flexibility in doing so. Every restriction means that in a certain edge case you can't get high performance, because the author decided that you aren't smart enough to use the underlying interface, something which I always found annoying about some of the Haskell libraries. So please, please, please, if you decide to use a newtype, do /not/ hide the constructor. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Am 17.10.2011 11:10, schrieb Ertugrul Soeylemez:
So please, please, please, if you decide to use a newtype, do /not/ hide the constructor.
The better alternative to "not hiding the constructor" is to supply conversion functions that may or may not do more than the constructor and selector and are named accordingly. (This just disallows pattern matching.) Cheers Christian
Greets, Ertugrul

Christian Maeder
So please, please, please, if you decide to use a newtype, do /not/ hide the constructor.
The better alternative to "not hiding the constructor" is to supply conversion functions that may or may not do more than the constructor and selector and are named accordingly. (This just disallows pattern matching.)
Except annoying library users, what would be the point of that? Please understand that as a middle level developer (abstractions, protocol implementations, frameworks, etc.) I am sometimes annoyed by the idealism of some library interfaces, and I find myself reinventing the wheel very often, because the closed interfaces of some existing libraries just don't support what I need, even though the technical basis would be there, or because of the abstraction forest I'm unable to guarantee or even get good performance. I could totally understand having a black box interface for some higher level stuff, but ByteString is still low/middle level and should support me as a developer on that level. Unifying vector and bytestring sounds like a great step, and I would find it ruined already by wrapping it up in a newtype. Hiding the constructor would make this even worse. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

On 17 October 2011 10:18, Christian Maeder
I think the cleanest solution (just from a theoretical point of view) is to use a newtype for your byte strings.
- it should have the same performance - allows to make ByteString really abstract when hiding the newtype constructor
But what would a newtype ByteString = ByteString (Vector Word8) abstract over? What's there to hide? Vectors are already abstract so users can't mess with their internals.
- is portable and supplies control over all other instances (not just Show)
What other instances (besides Show) should have different semantics than those of Vector?
I'm not sure if one could make really bad thinks to your ByteString by using the Vector interface, but one would want to disallow vector operations just for compatible with other byte strings.
My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector. Bas

On Mon, Oct 17, 2011 at 12:14 PM, Bas van Dijk
On 17 October 2011 10:18, Christian Maeder
wrote: I think the cleanest solution (just from a theoretical point of view) is to use a newtype for your byte strings.
- it should have the same performance - allows to make ByteString really abstract when hiding the newtype constructor
But what would a newtype ByteString = ByteString (Vector Word8) abstract over? What's there to hide? Vectors are already abstract so users can't mess with their internals.
- is portable and supplies control over all other instances (not just Show)
What other instances (besides Show) should have different semantics than those of Vector?
I'm not sure if one could make really bad thinks to your ByteString by using the Vector interface, but one would want to disallow vector operations just for compatible with other byte strings.
My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.
+1. I'm in favor of using the OverlappingInstances/no newtype and specialized Show instance. I think that, if there was *ever* a case where OverlappingInstances was a good fit, it's this one. We're talking about a single module exporting both the base and overlapped instance, so which instance gets used should be completely decidable. (Unless of course someone defines an orphan instance elsewhere, but that's a different issue IMO.) And even in a worst-case-scenario where somehow we get the wrong instance, we're only talking about output used as a debugging aid, so the damage is minimal. Also, aren't there a few documented cases where newtypes prevent certain GHC rewrite rules from firing? I don't see any strong argument to avoid what appears to be the simplest and most straight-forward solution to the problem at hand. Michael

Am 17.10.2011 12:19, schrieb Michael Snoyman: [...]
Also, aren't there a few documented cases where newtypes prevent certain GHC rewrite rules from firing?
This would be possible to find out with a wrapper module. Cheers Christian
I don't see any strong argument to avoid what appears to be the simplest and most straight-forward solution to the problem at hand.
Michael

Michael Snoyman wrote:
On Mon, Oct 17, 2011 at 12:14 PM, Bas van Dijk
wrote: My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.
+1. I'm in favor of using the OverlappingInstances/no newtype and specialized Show instance. I think that, if there was *ever* a case where OverlappingInstances was a good fit, it's this one. We're talking about a single module exporting both the base and overlapped instance, so which instance gets used should be completely decidable. (Unless of course someone defines an orphan instance elsewhere, but that's a different issue IMO.) And even in a worst-case-scenario where somehow we get the wrong instance, we're only talking about output used as a debugging aid, so the damage is minimal.
So suppose we change the Show and Read instances for Storable vectors of Word8 and Char. What happens with unboxed and boxed vectors of these types? Should these be changed as well? Should these be changed as well? If not, why not? Roman

On Mon, Oct 17, 2011 at 4:42 PM, Roman Leshchinskiy
Michael Snoyman wrote:
On Mon, Oct 17, 2011 at 12:14 PM, Bas van Dijk
wrote: My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.
+1. I'm in favor of using the OverlappingInstances/no newtype and specialized Show instance. I think that, if there was *ever* a case where OverlappingInstances was a good fit, it's this one. We're talking about a single module exporting both the base and overlapped instance, so which instance gets used should be completely decidable. (Unless of course someone defines an orphan instance elsewhere, but that's a different issue IMO.) And even in a worst-case-scenario where somehow we get the wrong instance, we're only talking about output used as a debugging aid, so the damage is minimal.
So suppose we change the Show and Read instances for Storable vectors of Word8 and Char. What happens with unboxed and boxed vectors of these types? Should these be changed as well? Should these be changed as well? If not, why not?
I don't have any strong opinion on the matter, but it seems like they may as well be changed also. It seems like all the same "useful for debugging" arguments would apply there as well. Michael

On 17 October 2011 16:44, Michael Snoyman
On Mon, Oct 17, 2011 at 4:42 PM, Roman Leshchinskiy
wrote: Michael Snoyman wrote:
On Mon, Oct 17, 2011 at 12:14 PM, Bas van Dijk
wrote: My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.
+1. I'm in favor of using the OverlappingInstances/no newtype and specialized Show instance. I think that, if there was *ever* a case where OverlappingInstances was a good fit, it's this one. We're talking about a single module exporting both the base and overlapped instance, so which instance gets used should be completely decidable. (Unless of course someone defines an orphan instance elsewhere, but that's a different issue IMO.) And even in a worst-case-scenario where somehow we get the wrong instance, we're only talking about output used as a debugging aid, so the damage is minimal.
So suppose we change the Show and Read instances for Storable vectors of Word8 and Char. What happens with unboxed and boxed vectors of these types? Should these be changed as well? Should these be changed as well? If not, why not?
I don't have any strong opinion on the matter, but it seems like they may as well be changed also. It seems like all the same "useful for debugging" arguments would apply there as well.
Michael
Yes I think that makes sense. My patch already adds specific Show and Read instances to all vectors of Chars and Word8s: http://trac.haskell.org/vector/ticket/64 Bas

Hi, On 17.10.2011, at 12:14, Bas van Dijk wrote:
On 17 October 2011 10:18, Christian Maeder
wrote: My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.
What about lazy bytestrings? I wasn't aware that vector also supports huge logical array which are suitable for very large io streams. I'd be glad if vector is also suitable for such applications. But if not, then there is still a need for the bytestring package in order to support streaming gigabytes of data in a small constant sized heap. Cheers, Jean

It's a good question, I don't think there is something in the vector library
that can handle chunks of vectors...
If both lazy and strict bytestrings are to be generalized, it would *at last
* permit to have a single interface to them, thanks to Data.Vector.Generic,
and no longer two identical interfaces in separate modules, which forces to
duplicate each code which handles bytestrings so that it can deal with the
two flavours.
2011/10/17 Jean-Marie Gaillourdet
Hi,
On 17.10.2011, at 12:14, Bas van Dijk wrote:
On 17 October 2011 10:18, Christian Maeder
wrote: My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.
What about lazy bytestrings? I wasn't aware that vector also supports huge logical array which are suitable for very large io streams. I'd be glad if vector is also suitable for such applications. But if not, then there is still a need for the bytestring package in order to support streaming gigabytes of data in a small constant sized heap.
Cheers, Jean _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 17 October 2011 20:15, Yves Parès
It's a good question, I don't think there is something in the vector library that can handle chunks of vectors...
Yes I forgot about lazy bytestrings when writing that. Of course vector-bytestring does provide lazy ByteStrings.
If both lazy and strict bytestrings are to be generalized, it would at last permit to have a single interface to them, thanks to Data.Vector.Generic, and no longer two identical interfaces in separate modules, which forces to duplicate each code which handles bytestrings so that it can deal with the two flavours.
It would be an interesting idea to add a chunking vector adapter to the vector package. I guess it will look something like this: data Chunks v a = Empty | Chunk {-# UNPACK #-} !(v a) (Chunks v a) foldrChunks :: (v a -> b -> b) -> b -> Chunks v a -> b foldrChunks f z = go where go Empty = z go (Chunk c cs) = f c (go cs) {-# INLINE foldrChunks #-} foldlChunks :: (b -> v a -> b) -> b -> Chunks v a -> b foldlChunks f z = go z where go !a Empty = a go !a (Chunk c cs) = go (f a c) cs {-# INLINE foldlChunks #-} Giving it an instance for Data.Vector.Generic.Base.Vector should be easy right? Anyone up for the job? Then I can replace my custom lazy ByteStrings with: type ByteString = Chunks Vector Word8 Bas

Am 17.10.2011 12:14, schrieb Bas van Dijk:
On 17 October 2011 10:18, Christian Maeder
wrote: I think the cleanest solution (just from a theoretical point of view) is to use a newtype for your byte strings.
- it should have the same performance - allows to make ByteString really abstract when hiding the newtype constructor
But what would a newtype ByteString = ByteString (Vector Word8) abstract over? What's there to hide? Vectors are already abstract so users can't mess with their internals.
Maybe some of the functions that start with "unsafe"? Or why do you use the safe variant (VS.head) in your own implementation? http://hackage.haskell.org/packages/archive/vector/0.9/doc/html/Data-Vector-...
- is portable and supplies control over all other instances (not just Show)
What other instances (besides Show) should have different semantics than those of Vector?
instance Read (and maybe the vector package will evolve further).
I'm not sure if one could make really bad thinks to your ByteString by using the Vector interface, but one would want to disallow vector operations just for compatible with other byte strings.
My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.
So your package basically supports an unfortunate mix of bytestring and vector functions? How about proposing a better bytestring interface (if it should not just be that of vector)? Btw. a really abstract bytestring could easily be implemented on top of your package. Cheers Christian
Bas

On 17 October 2011 13:12, Christian Maeder
Am 17.10.2011 12:14, schrieb Bas van Dijk:
On 17 October 2011 10:18, Christian Maeder
wrote: I think the cleanest solution (just from a theoretical point of view) is to use a newtype for your byte strings.
- it should have the same performance - allows to make ByteString really abstract when hiding the newtype constructor
But what would a newtype ByteString = ByteString (Vector Word8) abstract over? What's there to hide? Vectors are already abstract so users can't mess with their internals.
Maybe some of the functions that start with "unsafe"?
But to keep compatible with bytestring's Data.ByteString.Unsafe, I have to export the unsafe functions anyway. I do think we should provide a Data.Vector.Storable.Safe module which only exports the safe interface and mark it Trustworthy using the new Safe Haskell language extensions. Roman: any reason why only storable vectors are missing a Safe module? I could add one this evening, if you like? And should we also export Unsafe modules like how it's done in the base library?
http://hackage.haskell.org/packages/archive/vector/0.9/doc/html/Data-Vector-...
- is portable and supplies control over all other instances (not just Show)
What other instances (besides Show) should have different semantics than those of Vector?
instance Read (and maybe the vector package will evolve further).
I'm not sure if one could make really bad thinks to your ByteString by using the Vector interface, but one would want to disallow vector operations just for compatible with other byte strings.
My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.
So your package basically supports an unfortunate mix of bytestring and vector functions?
No, vector-bytestring exports the same API as bytestring (except for the Show and Read instances which will hopefully be fixed in a new vector release).
How about proposing a better bytestring interface (if it should not just be that of vector)?
I'm all for improving the interface but the goal of vector-bytestring is that it can be used as a drop-in replacement for bytestring without changing to much code. Regards, Bas

Am 17.10.2011 17:26, schrieb Bas van Dijk:
On 17 October 2011 13:12, Christian Maeder
wrote: So your package basically supports an unfortunate mix of bytestring and vector functions?
No, vector-bytestring exports the same API as bytestring (except for the Show and Read instances which will hopefully be fixed in a new vector release).
Yes, but Data.Vector.Storable can be simple imported and used in addition. I suppose, the (derived) Data instances (from vector and the original bytestrings) break the abstraction. (So you must hope nobody is relying on this instance.)
How about proposing a better bytestring interface (if it should not just be that of vector)?
I'm all for improving the interface but the goal of vector-bytestring is that it can be used as a drop-in replacement for bytestring without changing to much code.
Changing back to another drop-in replacement for bytestring will be difficult if functions from Data.Vector.Storable have been used. Thanks Christian
Regards,
Bas

On 17 October 2011 18:28, Christian Maeder
Am 17.10.2011 17:26, schrieb Bas van Dijk:
On 17 October 2011 13:12, Christian Maeder
wrote: So your package basically supports an unfortunate mix of bytestring and vector functions?
No, vector-bytestring exports the same API as bytestring (except for the Show and Read instances which will hopefully be fixed in a new vector release).
Yes, but Data.Vector.Storable can be simple imported and used in addition.
I consider that an advantage.
I suppose, the (derived) Data instances (from vector and the original bytestrings) break the abstraction. (So you must hope nobody is relying on this instance.)
Good point! I will mention that in the documentation of vector-bytestring. Also code using the ByteString constructor PS has to be changed because I obviously can't provide an equivalent. However the documentation of Data.ByteString.Internal (which exports PS) warns "normal" users not to use that module: "A module containing semi-public 'ByteString' internals. This exposes the 'ByteString' representation and low level construction functions. As such all the functions in this module are unsafe. The API is also not stable. Where possible application should instead use the functions from the normal public interface modules, such as "Data.ByteString.Unsafe". Packages that extend the ByteString system at a low level will need to use this module." So I expect not many packages are using the PS constructor directly which means the pain of switching to vectors will be minimal.
How about proposing a better bytestring interface (if it should not just be that of vector)?
I'm all for improving the interface but the goal of vector-bytestring is that it can be used as a drop-in replacement for bytestring without changing to much code.
Changing back to another drop-in replacement for bytestring will be difficult if functions from Data.Vector.Storable have been used.
True, so lets try to make this the final replacement ;-) Regards, Bas

Roman Leshchinskiy
Personally, I think that ByteString and especially Vector Word8 aren't strings and shouldn't be treated as such. But I wouldn't be strongly against showing them as strings. However, I *am* strongly against using UndecidableInstances in vector and I don't see how to implement this without using them.
I meant OverlappingInstances, of course. To clarify, I would still consider it if everybody thinks it's a really good idea.
My suggestion was to remove the generic Show instance and add only specialized instances. This is more work, but will also yield better results. In particular, it allows specialized string representations for other types, too. For example the way values of type Vector Bool are printed is extremely useless. I always find myself writing my own debugging output functions for boolean vectors. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

On 15 October 2011 20:50, Ertugrul Soeylemez
Roman Leshchinskiy
wrote: Personally, I think that ByteString and especially Vector Word8 aren't strings and shouldn't be treated as such. But I wouldn't be strongly against showing them as strings. However, I *am* strongly against using UndecidableInstances in vector and I don't see how to implement this without using them.
I meant OverlappingInstances, of course. To clarify, I would still consider it if everybody thinks it's a really good idea.
My suggestion was to remove the generic Show instance and add only specialized instances. This is more work, but will also yield better results. In particular, it allows specialized string representations for other types, too.
What exactly is the problem with using OverlappingInstances to define specialized Show and Read instances for Vectors with certain element types (Char, Word8, Bool)? Am I missing something dangerous here? Bas

Bas van Dijk
My suggestion was to remove the generic Show instance and add only specialized instances. This is more work, but will also yield better results. In particular, it allows specialized string representations for other types, too.
What exactly is the problem with using OverlappingInstances to define specialized Show and Read instances for Vectors with certain element types (Char, Word8, Bool)?
Am I missing something dangerous here?
Consider having the following instances: instance Show a => Show (Vector a) instance Show (Vector Word8) How could the compiler determine, which instance you want, when saying show someVector where someVector :: Vector Word8? Both instances are valid here, and there is no mechanism to choose one of them. You can only write a generic instance, where you can rule out the specialized instances. I don't think that's possible in this case. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

On 15 October 2011 23:17, Ertugrul Soeylemez
Both instances are valid here, and there is no mechanism to choose one of them.
There is: OverlappingInstances[1] chooses the most specific instance. So in case someVector :: Vector Word8 the instance Show (Vector Word8) is chosen because it's the most specific. Bas [1] http://www.haskell.org/ghc/docs/latest/html/users_guide/type-class-extension...

On 15 October 2011 23:56, Bas van Dijk
On 15 October 2011 23:17, Ertugrul Soeylemez
wrote: Both instances are valid here, and there is no mechanism to choose one of them.
There is: OverlappingInstances[1] chooses the most specific instance. So in case someVector :: Vector Word8 the instance Show (Vector Word8) is chosen because it's the most specific.
This has the problem of incoherence in multi-module programs - GHC might chose different instances for the same type depending on compilation order. For a Show instance, this may be acceptable.

On 16 October 2011 08:51, Stephen Tetley
On 15 October 2011 23:56, Bas van Dijk
wrote: On 15 October 2011 23:17, Ertugrul Soeylemez
wrote: Both instances are valid here, and there is no mechanism to choose one of them.
There is: OverlappingInstances[1] chooses the most specific instance. So in case someVector :: Vector Word8 the instance Show (Vector Word8) is chosen because it's the most specific.
This has the problem of incoherence in multi-module programs - GHC might chose different instances for the same type depending on compilation order. For a Show instance, this may be acceptable.
But is this a problem when both instances are exported from the same module and OverlappingInstances is only enabled in that module, as is the case here?

On 16 October 2011 10:06, Bas van Dijk
But is this a problem when both instances are exported from the same module and OverlappingInstances is only enabled in that module, as is the case here?
No - if the only instances defined are in the same module GHC would pick the most specific one. If there was no instance for Ivan's use-case of Vector Word8 in the "offical" module, and he chose to define this more specific instance elsewhere there is the potential for incoherence.

Bas van Dijk
On 15 October 2011 23:17, Ertugrul Soeylemez
wrote: Both instances are valid here, and there is no mechanism to choose one of them.
There is: OverlappingInstances[1] chooses the most specific instance. So in case someVector :: Vector Word8 the instance Show (Vector Word8) is chosen because it's the most specific.
Although I don't have a problem with using language extensions the vector package, as it is a commonly used tool, shouldn't require me to use an extension just to be able to debug my code. This would be particularly annoying when using GHCi, because you would always have to start it with an extension option. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

On Sun, Oct 16, 2011 at 12:43 PM, Ertugrul Soeylemez
Although I don't have a problem with using language extensions the vector package, as it is a commonly used tool, shouldn't require me to use an extension just to be able to debug my code. This would be particularly annoying when using GHCi, because you would always have to start it with an extension option.
You don't need it. The OverlappingInstances extension needs to be enabled only where the ovarlapping instances are defined, *not* where they're used. =) Cheers, -- Felipe.

Felipe Almeida Lessa
Although I don't have a problem with using language extensions the vector package, as it is a commonly used tool, shouldn't require me to use an extension just to be able to debug my code. This would be particularly annoying when using GHCi, because you would always have to start it with an extension option.
You don't need it. The OverlappingInstances extension needs to be enabled only where the ovarlapping instances are defined, *not* where they're used. =)
I see. Then I'm totally fine with it. =) Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Am 12.10.2011 16:02, schrieb Bas van Dijk:
API DOCS
http://hackage.haskell.org/package/vector-bytestring-0.0.0.0
you could re-export VS.empty, VS.singleton, etc. directly. Cheers Christian -- | /O(1)/ The empty 'ByteString' empty :: ByteString empty = VS.empty {-# INLINE empty #-} -- | /O(1)/ Convert a 'Word8' into a 'ByteString' singleton :: Word8 -> ByteString singleton = VS.singleton {-# INLINE [1] singleton #-} -- Inline [1] for intercalate rule

2011/10/18 Christian Maeder
you could re-export VS.empty, VS.singleton, etc. directly.
The vector singleton and the vector-bytestring singleton don't have the same type. vector:
singleton :: a -> Vector a
vector-bytestring:
singleton :: Word8 -> Vector Word8
By choosing the more general type you risk that a previously correct program becomes ambiguous. (When migrating from bytestring to vector-bytestring). I'm not sure if this will actually occur in practive or that it holds for all the little functions that you could theoretically re-export directly. Maybe we create an example program which would fail with the more general type. Proving the opposite (that the more general type is always safe) will be more difficult.

2011/10/18 Roel van Dijk
Maybe we [can] create an example program which would fail with the more general type.
Migrating the function "foo" from bytestring to vector-bytestring would fail with more general types:
import Data.ByteString foo = print empty Ok, modules loaded: Test.
With vector:
import Data.Vector.Storable foo = print empty Ambiguous type variable `a0' in the constraints: (Show a0) arising from a use of `print' at /home/roelvandijk/development/test.hs:5:7-11 (Storable a0) arising from a use of `empty' at /home/roelvandijk/development/test.hs:5:13-17 Probable fix: add a type signature that fixes these type variable(s) In the expression: print empty In an equation for `foo': foo = print empty Failed, modules loaded: none.
participants (18)
-
AM
-
Bas van Dijk
-
Christian Maeder
-
Conrad Parker
-
Ertugrul Soeylemez
-
Evan Laforge
-
Felipe Almeida Lessa
-
Ivan Lazar Miljenovic
-
Jean-Marie Gaillourdet
-
Joachim Breitner
-
Ketil Malde
-
Max Rabkin
-
Michael Snoyman
-
Roel van Dijk
-
Roman Leshchinskiy
-
Stephen Tetley
-
Vincent Hanquez
-
Yves Parès