ANNOUNCE: vector-bytestring-0.0.0.0

Bas van Dijk

12 Oct 2011 12 Oct '11

2:02 p.m.

All your ByteString are belong to us... Hello, I'm pleased to announce the beta release of vector-bytestring. This library provides the type ByteString which is defined as a type synonym for a storable Vector of Word8s (from the vector package): type ByteString = Data.Vector.Storable.Vector Word8 It exports the same API as the bytestring package except that the module names are prefixed with: Data.Vector.Storable.ByteString instead of Data.ByteString. The very ambitious goal of this package is that it will eventually replace our beloved bytestring package. By basing this package on vector, we can benefit from all the optimizations (like stream-fusion!) in that library. We will also have just a single library to test, debug and optimize. TEST-SUITE I ported the bytestring test-suite to vector-bytestring. You can run it using: $ cabal configure --enable-tests; cabal build; cabal test All 54800 tests pass! Only one property doesn't hold: prop_show :: ByteString -> Bool prop_show x = show x == show (unpack x) This is because I don't provide a custom Show instance for ByteStrings but use the one from Vector which shows a vector like "fromList [1,2,3]" instead as "\SOH\STX\ETX" like bytestring does. Hopefully this is not a problem in practice. BENCHMARKS I added a criterion based benchmark-suite to vector-bytestring. It consists of over 600 benchmarks that cover almost every function in the library. Also included are benchmarks which benchmark the fusion capabilities of the library. Run it using: $ cabal configure -fbenchmark; cabal build; $ dist/build/bench/bench --help Unfortunately, bytestring still out performs us in lots of benchmarks. I believe the primary cause of this is that most functions are implemented using stream-fusion. This is highly efficient if you use a composition of these functions because they will all fuse into one single efficient loop. However if your program uses only a single function, the stream based implementation is often less efficient than an implementation that works directly on a mutable vector (like most functions in bytestring). So what we want is to use stream-fusion where possible but use mutable vectors when our program doesn't fuse. Fortunately, Roman Leshchinskiy (author of vector) has an idea how to do this: http://trac.haskell.org/vector/ticket/60 Because we don't beat bytestring in all cases yet you should consider this a beta-release and not use it in production code. INSTALLING $ cabal install vector-bytestring API DOCS http://hackage.haskell.org/package/vector-bytestring-0.0.0.0 DEVELOPING $ git clone https://github.com/basvandijk/vector-bytestring Regards, Bas

Show replies by date

Christian Maeder

14 Oct 14 Oct

10:58 a.m.

Am 12.10.2011 16:02, schrieb Bas van Dijk:

...

All your ByteString are belong to us...

Hello,

I'm pleased to announce the beta release of vector-bytestring. This library provides the type ByteString which is defined as a type synonym for a storable Vector of Word8s (from the vector package):

type ByteString = Data.Vector.Storable.Vector Word8

It exports the same API as the bytestring package except that the module names are prefixed with: Data.Vector.Storable.ByteString instead of Data.ByteString.

The very ambitious goal of this package is that it will eventually replace our beloved bytestring package. By basing this package on vector, we can benefit from all the optimizations (like stream-fusion!) in that library. We will also have just a single library to test, debug and optimize.

TEST-SUITE

I ported the bytestring test-suite to vector-bytestring. You can run it using:

$ cabal configure --enable-tests; cabal build; cabal test

All 54800 tests pass! Only one property doesn't hold:

prop_show :: ByteString -> Bool prop_show x = show x == show (unpack x)

This is because I don't provide a custom Show instance for ByteStrings but use the one from Vector which shows a vector like "fromList [1,2,3]" instead as "\SOH\STX\ETX" like bytestring does. Hopefully this is not a problem in practice.

All derived Show instances for data types using your ByteString will be different, too. Would it not be simple to use a newtype for ByteString (rather than a synonym)? Cheers Christian

Bas van Dijk

11:37 a.m.

On 14 October 2011 12:58, Christian Maeder wrote:

...

Would it not be simple to use a newtype for ByteString (rather than a synonym)?

My "vision" for the future of bytestring and vector-bytestring is that they will be replaced by vector directly. This way users don't have to think about choosing between bytestring and vector and can go to vector directly to work with Word8 vectors or interface with foreign libraries. This would mean moving some (bytestring only) functions from vector-bytestring to vector like mapAccumL/R, create, createAndTrim, etc. and generalizing them from Word8s to any Storable. So in my vision there's no ByteString type anymore, just Vectors. The vector-bytestring package is meant to make the transition smoother. If there's need for a specific Show instance for Vectors of Word8s we can always add one directly to vector. (Roman, what are your thoughts on this?) Bas

Ivan Lazar Miljenovic

11:45 a.m.

On 14 October 2011 22:37, Bas van Dijk wrote:

...

On 14 October 2011 12:58, Christian Maeder wrote:

...
Would it not be simple to use a newtype for ByteString (rather than a synonym)?

If there's need for a specific Show instance for Vectors of Word8s we can always add one directly to vector. (Roman, what are your thoughts on this?)

I suppose you could add a newtype wrapper, but it _would_ require duplicating the API to do so. Though I would argue that unless you're trying to actually use for Show/Read for serialisation, does it really matter what the Show/Read instances for Bytestring are? -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Max Rabkin

11:50 a.m.

On Fri, Oct 14, 2011 at 13:45, Ivan Lazar Miljenovic wrote:

...

Though I would argue that unless you're trying to actually use for Show/Read for serialisation, does it really matter what the Show/Read instances for Bytestring are?

Convenient debugging and REPL interaction certainly matter! --Max

Ertugrul Soeylemez

12:09 p.m.

Max Rabkin wrote:

...

...
Though I would argue that unless you're trying to actually use for Show/Read for serialisation, does it really matter what the Show/Read instances for Bytestring are?

Convenient debugging and REPL interaction certainly matter!

On the other hand, having a separate Show instance for Vector Word8 would require either writing all Show instances explicitly or keeping two separate packages. I would prefer to have the two packages merged into one. But since I find a useful Show instance for ByteString useful, too, I would go with the first variant of providing a few default instances instead of a generic Show a => Show (Vector a) instance. That way you can write nicer instances for some other element types, too. For example I can imagine how a much nicer Vector Bool instance would look like: fromBoolString "1.1..111..1" Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Bas van Dijk

1:57 p.m.

On 14 October 2011 13:37, Bas van Dijk wrote:

...

If there's need for a specific Show instance for Vectors of Word8s we can always add one directly to vector. (Roman, what are your thoughts on this?)

Ok I have proposed and implemented this for vector: http://trac.haskell.org/vector/ticket/64 Bas

Roman Leshchinskiy

15 Oct 15 Oct

11:26 a.m.

On 14/10/2011, at 12:37, Bas van Dijk wrote:

...

If there's need for a specific Show instance for Vectors of Word8s we can always add one directly to vector. (Roman, what are your thoughts on this?)

Personally, I think that ByteString and especially Vector Word8 aren't strings and shouldn't be treated as such. But I wouldn't be strongly against showing them as strings. However, I *am* strongly against using UndecidableInstances in vector and I don't see how to implement this without using them. Roman

Roman Leshchinskiy

11:34 a.m.

On 15/10/2011, at 12:26, Roman Leshchinskiy wrote:

...

On 14/10/2011, at 12:37, Bas van Dijk wrote:

...
If there's need for a specific Show instance for Vectors of Word8s we can always add one directly to vector. (Roman, what are your thoughts on this?)

Personally, I think that ByteString and especially Vector Word8 aren't strings and shouldn't be treated as such. But I wouldn't be strongly against showing them as strings. However, I *am* strongly against using UndecidableInstances in vector and I don't see how to implement this without using them.

I meant OverlappingInstances, of course. To clarify, I would still consider it if everybody thinks it's a really good idea. Roman

Bas van Dijk

2:15 p.m.

On 15 October 2011 13:34, Roman Leshchinskiy wrote:

...

On 15/10/2011, at 12:26, Roman Leshchinskiy wrote:

...
On 14/10/2011, at 12:37, Bas van Dijk wrote:

...
If there's need for a specific Show instance for Vectors of Word8s we can always add one directly to vector. (Roman, what are your thoughts on this?)

Personally, I think that ByteString and especially Vector Word8 aren't strings and shouldn't be treated as such. But I wouldn't be strongly against showing them as strings. However, I *am* strongly against using UndecidableInstances in vector and I don't see how to implement this without using them.

I meant OverlappingInstances, of course. To clarify, I would still consider it if everybody thinks it's a really good idea.

Roman

...

import Network.HTTP.Enumerator simpleHttp "http://code.haskell.org/~basvandijk/" Chunk "<html>\n<head><title>Bas van Dijk</title></head>\n<body>\n<h1>Bas van Dijk</h1>\n\n<p>Email:

I agree that you shouldn't use ByteStrings or Vectors of Word8s for Unicode strings. However I can imagine that for quick sessions in ghci it can be quite handy if they are shown as strings. For example, currently we have: href=\"mailto://v.dijk.bas@gmail.com\">v.dijk.bas@gmail.com</a></p>\n\n<p>Nick on IRC: <tt>basvandijk</tt></p>\n\nhttp://www.haskellers.com/user/basvandijk/\">\n http://www.haskellers.com/static/badge.png\" \n alt=\"I'm a Haskeller\"\n border=\"0\">\n</a>\n\n<p>See my https://github.com/basvandijk\">GitHub</a> page for a list of projects I work on.</p>\n\n</body>\n</html>\n" Empty If ByteStrings were not shown as strings this would look like: Chunk ( fromList [60,104,116,109,108,62,10,60,104,101,97,100,62,60,116,105,116,108,101,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,116,105,116,108,101,62,60,47,104,101,97,100,62,10,60,98,111,100,121,62,10,60,104,49,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,104,49,62,10,10,60,112,62,69,109,97,105,108,58,32,60,97,32,104,114,101,102,61,34,109,97,105,108,116,111,58,47,47,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,34,62,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,60,47,97,62,60,47,112,62,10,10,60,112,62,78,105,99,107,32,111,110,32,73,82,67,58,32,60,116,116,62,98,97,115,118,97,110,100,105,106,107,60,47,116,116,62,60,47,112,62,10,10,60,97,32,104,114,101,102,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,117,115,101,114,47,98,97,115,118,97,110,100,105,106,107,47,34,62,10,32,32,60,105,109,103,32,115,114,99,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,115,116,97,116,105,99,47,98,97,100,103,101,46,112,110,103,34,32,10,32,32,32,32,32,32,32,97,108,116,61,34,73,39,109,32,97,32,72,97,115,107,101,108,108,101,114,34,10,32,32,32,32,32,32,32,98,111,114,100,101,114,61,34,48,34,62,10,60,47,97,62,10,10,60,112,62,83,101,101,32,109,121,32,60,97,32,104,114,101,102,61,34,104,116,116,112,115,58,47,47,103,105,116,104,117,98,46,99,111,109,47,98,97,115,118,97,110,100,105,106,107,34,62,71,105,116,72,117,98,60,47,97,62,32,112,97,103,101,32,102,111,114,32,97,32,108,105,115,116,32,111,102,32,112,114,111,106,101,99,116,115,32,73,32,119,111,114,107,32,111,110,46,60,47,112,62,10,10,60,47,98,111,100,121,62,10,60,47,104,116,109,108,62,10]) Empty Personally, I don't work in ghci that often so I don't care that much if we have or don't have specialized Show instances for Vectors of Word8s. So what do other people think about this? Bas

Joachim Breitner

2:39 p.m.

Hi, Am Samstag, den 15.10.2011, 16:15 +0200 schrieb Bas van Dijk:

...

So what do other people think about this?

having a human-readable Show instance for ByteStrings is definitely a great plus when debugging code. Greetings, Joachim -- Joachim "nomeata" Breitner mail@joachim-breitner.de | nomeata@debian.org | GPG: 0x4743206C xmpp: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/

Ertugrul Soeylemez

6:47 p.m.

Joachim Breitner wrote:

...

...
So what do other people think about this?

having a human-readable Show instance for ByteStrings is definitely a great plus when debugging code.

I agree and would even go as far as saying that it's generally useful, even if the data is not guaranteed to be text. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Ivan Lazar Miljenovic

3:18 p.m.

On 16 October 2011 01:15, Bas van Dijk wrote:

...

I agree that you shouldn't use ByteStrings or Vectors of Word8s for Unicode strings. However I can imagine that for quick sessions in ghci it can be quite handy if they are shown as strings. For example, currently we have:

...
import Network.HTTP.Enumerator simpleHttp "http://code.haskell.org/~basvandijk/" Chunk "<html>\n<head><title>Bas van Dijk</title></head>\n<body>\n<h1>Bas van Dijk</h1>\n\n<p>Email: v.dijk.bas@gmail.com</a></p>\n\n<p>Nick on IRC: <tt>basvandijk</tt></p>\n\n http://www.haskellers.com/user/basvandijk/\">\n http://www.haskellers.com/static/badge.png\" \n alt=\"I'm a Haskeller\"\n border=\"0\">\n</a>\n\n<p>See my https://github.com/basvandijk\">GitHub</a> page for a list of projects I work on.</p>\n\n</body>\n</html>\n" Empty

If ByteStrings were not shown as strings this would look like:

Chunk ( fromList [60,104,116,109,108,62,10,60,104,101,97,100,62,60,116,105,116,108,101,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,116,105,116,108,101,62,60,47,104,101,97,100,62,10,60,98,111,100,121,62,10,60,104,49,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,104,49,62,10,10,60,112,62,69,109,97,105,108,58,32,60,97,32,104,114,101,102,61,34,109,97,105,108,116,111,58,47,47,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,34,62,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,60,47,97,62,60,47,112,62,10,10,60,112,62,78,105,99,107,32,111,110,32,73,82,67,58,32,60,116,116,62,98,97,115,118,97,110,100,105,106,107,60,47,116,116,62,60,47,112,62,10,10,60,97,32,104,114,101,102,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,117,115,101,114,47,98,97,115,118,97,110,100,105,106,107,47,34,62,10,32,32,60,105,109,103,32,115,114,99,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,115,116,97,116,105,99,47,98,97,100,103,101,46,112,110,103,34,32,10,32,32,32,32,32,32,32,97,108,116,61,34,73,39,109,32,97,32,72,97,115,107,101,108,108,101,114,34,10,32,32,32,32,32,32,32,98,111,114,100,101,114,61,34,48,34,62,10,60,47,97,62,10,10,60,112,62,83,101,101,32,109,121,32,60,97,32,104,114,101,102,61,34,104,116,116,112,115,58,47,47,103,105,116,104,117,98,46,99,111,109,47,98,97,115,118,97,110,100,105,106,107,34,62,71,105,116,72,117,98,60,47,97,62,32,112,97,103,101,32,102,111,114,32,97,32,108,105,115,116,32,111,102,32,112,114,111,106,101,99,116,115,32,73,32,119,111,114,107,32,111,110,46,60,47,112,62,10,10,60,47,98,111,100,121,62,10,60,47,104,116,109,108,62,10]) Empty

Personally, I don't work in ghci that often so I don't care that much if we have or don't have specialized Show instances for Vectors of Word8s.

So what do other people think about this?

Actually, for my current use case of Bytestrings (binary encoding of graphs using existing encoding schemes), I would prefer this [Word8]-based Show instance as it would help with my debugging, since the output looks along the lines of: Chunk (fromList [3,2,3,0,3,1,3,0,2,2,1,0]). I am the first to admit that my use case is probably different from others though. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Conrad Parker

18 Oct 18 Oct

12:30 a.m.

On 15 October 2011 23:18, Ivan Lazar Miljenovic wrote:

...

On 16 October 2011 01:15, Bas van Dijk wrote:

...
I agree that you shouldn't use ByteStrings or Vectors of Word8s for Unicode strings. However I can imagine that for quick sessions in ghci it can be quite handy if they are shown as strings. For example, currently we have:

...
import Network.HTTP.Enumerator simpleHttp "http://code.haskell.org/~basvandijk/" Chunk "<html>\n<head><title>Bas van Dijk</title></head>\n<body>\n<h1>Bas van Dijk</h1>\n\n<p>Email: v.dijk.bas@gmail.com</a></p>\n\n<p>Nick on IRC: <tt>basvandijk</tt></p>\n\n http://www.haskellers.com/user/basvandijk/\">\n http://www.haskellers.com/static/badge.png\" \n alt=\"I'm a Haskeller\"\n border=\"0\">\n</a>\n\n<p>See my https://github.com/basvandijk\">GitHub</a> page for a list of projects I work on.</p>\n\n</body>\n</html>\n" Empty

If ByteStrings were not shown as strings this would look like:

Chunk ( fromList [60,104,116,109,108,62,10,60,104,101,97,100,62,60,116,105,116,108,101,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,116,105,116,108,101,62,60,47,104,101,97,100,62,10,60,98,111,100,121,62,10,60,104,49,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,104,49,62,10,10,60,112,62,69,109,97,105,108,58,32,60,97,32,104,114,101,102,61,34,109,97,105,108,116,111,58,47,47,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,34,62,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,60,47,97,62,60,47,112,62,10,10,60,112,62,78,105,99,107,32,111,110,32,73,82,67,58,32,60,116,116,62,98,97,115,118,97,110,100,105,106,107,60,47,116,116,62,60,47,112,62,10,10,60,97,32,104,114,101,102,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,117,115,101,114,47,98,97,115,118,97,110,100,105,106,107,47,34,62,10,32,32,60,105,109,103,32,115,114,99,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,115,116,97,116,105,99,47,98,97,100,103,101,46,112,110,103,34,32,10,32,32,32,32,32,32,32,97,108,116,61,34,73,39,109,32,97,32,72,97,115,107,101,108,108,101,114,34,10,32,32,32,32,32,32,32,98,111,114,100,101,114,61,34,48,34,62,10,60,47,97,62,10,10,60,112,62,83,101,101,32,109,121,32,60,97,32,104,114,101,102,61,34,104,116,116,112,115,58,47,47,103,105,116,104,117,98,46,99,111,109,47,98,97,115,118,97,110,100,105,106,107,34,62,71,105,116,72,117,98,60,47,97,62,32,112,97,103,101,32,102,111,114,32,97,32,108,105,115,116,32,111,102,32,112,114,111,106,101,99,116,115,32,73,32,119,111,114,107,32,111,110,46,60,47,112,62,10,10,60,47,98,111,100,121,62,10,60,47,104,116,109,108,62,10]) Empty

Personally, I don't work in ghci that often so I don't care that much if we have or don't have specialized Show instances for Vectors of Word8s.

So what do other people think about this?

Actually, for my current use case of Bytestrings (binary encoding of graphs using existing encoding schemes), I would prefer this [Word8]-based Show instance as it would help with my debugging, since the output looks along the lines of: Chunk (fromList [3,2,3,0,3,1,3,0,2,2,1,0]). I am the first to admit that my use case is probably different from others though.

And I often work with mixed text/binary data (eg. text annotations in video streams). I'd want the Show/Read instances to be in the form of a hexdump with char representation alongside (like xxd or od -xc output). It roundtrips well, so why not? :-) Conrad.

Felipe Almeida Lessa

2:37 a.m.

On Mon, Oct 17, 2011 at 10:30 PM, Conrad Parker wrote:

...

And I often work with mixed text/binary data (eg. text annotations in video streams). I'd want the Show/Read instances to be in the form of a hexdump with char representation alongside (like xxd or od -xc output). It roundtrips well, so why not? :-)

Interesting idea. I quite like it! -- Felipe.

Vincent Hanquez

6:33 a.m.

On 10/18/2011 01:30 AM, Conrad Parker wrote:

...

And I often work with mixed text/binary data (eg. text annotations in video streams). I'd want the Show/Read instances to be in the form of a hexdump with char representation alongside (like xxd or od -xc output). It roundtrips well, so why not? :-) (slightly out of topic ...)

I often do mixed text/binary too, and i now use the following package: http://hackage.haskell.org/package/bytedump The problem with a Show instance is that there's no way to configure some aspects of it :-) -- Vincent

Roman Leshchinskiy

19 Oct 19 Oct

11:09 a.m.

Conrad Parker wrote:

...

On 15 October 2011 23:18, Ivan Lazar Miljenovic wrote:

...
On 16 October 2011 01:15, Bas van Dijk wrote:

...
I agree that you shouldn't use ByteStrings or Vectors of Word8s for Unicode strings. However I can imagine that for quick sessions in ghci it can be quite handy if they are shown as strings. For example, currently we have:

...
import Network.HTTP.Enumerator simpleHttp "http://code.haskell.org/~basvandijk/" Chunk "<html>\n<head><title>Bas van Dijk</title></head>\n<body>\n<h1>Bas van Dijk</h1>\n\n<p>Email: v.dijk.bas@gmail.com</a></p>\n\n<p>Nick on IRC: <tt>basvandijk</tt></p>\n\n http://www.haskellers.com/user/basvandijk/\">\n http://www.haskellers.com/static/badge.png\" \n alt=\"I'm a Haskeller\"\n border=\"0\">\n</a>\n\n<p>See my https://github.com/basvandijk\">GitHub</a> page for a list of projects I work on.</p>\n\n</body>\n</html>\n" Empty

If ByteStrings were not shown as strings this would look like:

Chunk ( fromList [60,104,116,109,108,62,10,60,104,101,97,100,62,60,116,105,116,108,101,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,116,105,116,108,101,62,60,47,104,101,97,100,62,10,60,98,111,100,121,62,10,60,104,49,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,104,49,62,10,10,60,112,62,69,109,97,105,108,58,32,60,97,32,104,114,101,102,61,34,109,97,105,108,116,111,58,47,47,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,34,62,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,60,47,97,62,60,47,112,62,10,10,60,112,62,78,105,99,107,32,111,110,32,73,82,67,58,32,60,116,116,62,98,97,115,118,97,110,100,105,106,107,60,47,116,116,62,60,47,112,62,10,10,60,97,32,104,114,101,102,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,117,115,101,114,47,98,97,115,118,97,110,100,105,106,107,47,34,62,10,32,32,60,105,109,103,32,115,114,99,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,115,116,97,116,105,99,47,98,97,100,103,101,46,112,110,103,34,32,10,32,32,32,32,32,32,32,97,108,116,61,34,73,39,109,32,97,32,72,97,115,107,101,108,108,101,114,34,10,32,32,32,32,32,32,32,98,111,114,100,101,114,61,34,48,34,62,10,60,47,97,62,10,10,60,112,62,83,101,101,32,109,121,32,60,97,32,104,114,101,102,61,34,104,116,116,112,115,58,47,47,103,105,116,104,117,98,46,99,111,109,47,98,97,115,118,97,110,100,105,106,107,34,62,71,105,116,72,117,98,60,47,97,62,32,112,97,103,101,32,102,111,114,32,97,32,108,105,115,116,32,111,102,32,112,114,111,106,101,99,116,115,32,73,32,119,111,114,107,32,111,110,46,60,47,112,62,10,10,60,47,98,111,100,121,62,10,60,47,104,116,109,108,62,10]) Empty

Personally, I don't work in ghci that often so I don't care that much if we have or don't have specialized Show instances for Vectors of Word8s.

So what do other people think about this?

Actually, for my current use case of Bytestrings (binary encoding of graphs using existing encoding schemes), I would prefer this [Word8]-based Show instance as it would help with my debugging, since the output looks along the lines of: Chunk (fromList [3,2,3,0,3,1,3,0,2,2,1,0]). I am the first to admit that my use case is probably different from others though.

And I often work with mixed text/binary data (eg. text annotations in video streams). I'd want the Show/Read instances to be in the form of a hexdump with char representation alongside (like xxd or od -xc output). It roundtrips well, so why not? :-)

So it seems that (1) people have very different requirements and (2) the Show instance only really matters for debugging in ghci. Here is a thought. What if ghci allowed Show instances to be overridden dynamically? So you could put your preferred Show instance for Vector Word8 in you .ghci file and ghci would use that when displaying stuff (but not when actually evaluating things). Would that solve most of the problems without messing with vector's Show instances? Roman

Ivan Lazar Miljenovic

11:32 a.m.

On 19 October 2011 22:09, Roman Leshchinskiy wrote:

...

So it seems that (1) people have very different requirements and (2) the Show instance only really matters for debugging in ghci. Here is a thought. What if ghci allowed Show instances to be overridden dynamically? So you could put your preferred Show instance for Vector Word8 in you .ghci file and ghci would use that when displaying stuff (but not when actually evaluating things). Would that solve most of the problems without messing with vector's Show instances?

Would this hypothetical ghci feature also work for cases where you have a ByteString as part of another type that derives Show and Read? I also wonder whether it would suffice to have a ByteString -> String function available rather than requiring Show per-se for the case of a ByteString on its lonesome. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

AM

2:59 p.m.

On Oct 19, 2011, at 7:32 AM, Ivan Lazar Miljenovic wrote:

...

On 19 October 2011 22:09, Roman Leshchinskiy wrote:

...
So it seems that (1) people have very different requirements and (2) the Show instance only really matters for debugging in ghci. Here is a thought. What if ghci allowed Show instances to be overridden dynamically? So you could put your preferred Show instance for Vector Word8 in you .ghci file and ghci would use that when displaying stuff (but not when actually evaluating things). Would that solve most of the problems without messing with vector's Show instances?

Would this hypothetical ghci feature also work for cases where you have a ByteString as part of another type that derives Show and Read?

I also wonder whether it would suffice to have a ByteString -> String function available rather than requiring Show per-se for the case of a ByteString on its lonesome.

Note that other programming languages have had to solve this exact problem and they usually end up with multiple functions- one for debugging, one for serialization, one for displaying how the object was constructed. For example, in Python, look at: http://docs.python.org/reference/datamodel.html#object.__repr__ http://docs.python.org/reference/datamodel.html#object.__str__ Cheers, M

Stephen Tetley

4:49 p.m.

On 19 October 2011 15:59, AM wrote:

...

Note that other programming languages have had to solve this exact problem and they usually end up with multiple functions- one for debugging, one for serialization, one for displaying how the object was constructed.

As per Haskell with Show (representation) and Data.Binary (serialization), of course.

Evan Laforge

5:20 p.m.

On Wed, Oct 19, 2011 at 9:49 AM, Stephen Tetley wrote:

...

On 19 October 2011 15:59, AM wrote:

...
Note that other programming languages have had to solve this exact problem and they usually end up with multiple functions- one for debugging, one for serialization, one for displaying how the object was constructed.

As per Haskell with Show (representation) and Data.Binary (serialization), of course.

That's different, python's str() is meant to be human readable and not necessarily parseable, I have a haskell equivalent in a Pretty class. repr() is like haskell's show, it's meant to also be human readable but also be parseable to get the original value back. The problem is that you can write deriving for Show but not for Pretty, so as soon as you want to pretty a record you're back to writing stuff by hand, even if it's just to fix up one field (say it's a function or a huge table that you want to abbreviate to 'HugeTable: 73246 entries'). Perhaps the new generic deriving stuff could fix that, I'd like to see 'deriving (Show)' implemented in haskell so I could write my own variations. Data.Binary would be what python calls pickle.

Roman Leshchinskiy

7:22 p.m.

Ivan Lazar Miljenovic wrote:

...

On 19 October 2011 22:09, Roman Leshchinskiy wrote:

...
So it seems that (1) people have very different requirements and (2) the Show instance only really matters for debugging in ghci. Here is a thought. What if ghci allowed Show instances to be overridden dynamically? So you could put your preferred Show instance for Vector Word8 in you .ghci file and ghci would use that when displaying stuff (but not when actually evaluating things). Would that solve most of the problems without messing with vector's Show instances?

Would this hypothetical ghci feature also work for cases where you have a ByteString as part of another type that derives Show and Read?

Yes. The idea would be to evaluate the expression, then build the Show instance for the type of the result taking the ghci overrides into account and then use that to display the result. I have to admit that I have no idea how difficult it would be to do this but surely it can't be that hard. Roman

Michael Snoyman

12:38 p.m.

On Wed, Oct 19, 2011 at 1:09 PM, Roman Leshchinskiy wrote:

...

So it seems that (1) people have very different requirements and (2) the Show instance only really matters for debugging in ghci. Here is a thought. What if ghci allowed Show instances to be overridden dynamically? So you could put your preferred Show instance for Vector Word8 in you .ghci file and ghci would use that when displaying stuff (but not when actually evaluating things). Would that solve most of the problems without messing with vector's Show instances?

I actually think it's more than just GHCi. A lot of the time when debugging some code, I'll litter it with "print"s to see what's going on. For me, I'd rather change the actual Show instance. It might make sense to try and pursue something like what you're suggesting, but I think the default Show (Vector Word8) should be the one most useful, most of the time, and I think the general consensus seems to be the current ByteString instance fits that role. Michael

Ketil Malde

7:29 p.m.

Michael Snoyman writes:

...

sense to try and pursue something like what you're suggesting, but I think the default Show (Vector Word8) should be the one most useful, most of the time, and I think the general consensus seems to be the current ByteString instance fits that role.

Hm. I think it is slightly weird to display a numeric value (Word8) as a Char. Also, I would prefer a representation making the type explicit (but unlike ByteString, vector seems to add a type annotation.) Would you still support the truncating behavior for 'read' and values above 255? (ByteString has two interfaces, ByteString and .Char8, but as there can be only one Show instance, I see why it works the way it does.) -k -- If I haven't seen further, it is by standing in the footprints of giants

Michael Snoyman

20 Oct 20 Oct

7:43 p.m.

On Wed, Oct 19, 2011 at 9:29 PM, Ketil Malde wrote:

...

Michael Snoyman writes:

...
sense to try and pursue something like what you're suggesting, but I think the default Show (Vector Word8) should be the one most useful, most of the time, and I think the general consensus seems to be the current ByteString instance fits that role.

Hm. I think it is slightly weird to display a numeric value (Word8) as a Char. Also, I would prefer a representation making the type explicit (but unlike ByteString, vector seems to add a type annotation.) Would you still support the truncating behavior for 'read' and values above 255?

(ByteString has two interfaces, ByteString and .Char8, but as there can be only one Show instance, I see why it works the way it does.)

Perhaps the correct semantic approach would be to have: newtype Char8 = Char8 Word8 But I think that will break far too many applications to try to get it implemented. In an ideal world, I agree with both points: displaying a numeric value as a Char doesn't make sense, and there are definitely issues with the Read instance. However, I still think current behavior is the least of all available evils. Show/Read work properly as a pair and can encode/decode any ByteString, and there's never any presumption that all input to read is valid. Michael

Christian Maeder

21 Oct 21 Oct

9:58 a.m.

Am 20.10.2011 21:43, schrieb Michael Snoyman:

...

On Wed, Oct 19, 2011 at 9:29 PM, Ketil Malde wrote:

...
Michael Snoyman writes:

...
sense to try and pursue something like what you're suggesting, but I think the default Show (Vector Word8) should be the one most useful, most of the time, and I think the general consensus seems to be the current ByteString instance fits that role.

Hm. I think it is slightly weird to display a numeric value (Word8) as a Char. Also, I would prefer a representation making the type explicit (but unlike ByteString, vector seems to add a type annotation.) Would you still support the truncating behavior for 'read' and values above 255?

(ByteString has two interfaces, ByteString and .Char8, but as there can be only one Show instance, I see why it works the way it does.)

Perhaps the correct semantic approach would be to have:

newtype Char8 = Char8 Word8

But I think that will break far too many applications to try to get it

would a new Word8 type be better to stay compatible? newtype Word8 = C8 Data.Word.Word8 C.

...

implemented. In an ideal world, I agree with both points: displaying a numeric value as a Char doesn't make sense, and there are definitely issues with the Read instance. However, I still think current behavior is the least of all available evils. Show/Read work properly as a pair and can encode/decode any ByteString, and there's never any presumption that all input to read is valid.

Michael

Michael Snoyman

10:32 a.m.

On Fri, Oct 21, 2011 at 11:58 AM, Christian Maeder wrote:

...

Am 20.10.2011 21:43, schrieb Michael Snoyman:

...
On Wed, Oct 19, 2011 at 9:29 PM, Ketil Malde wrote:

...
Michael Snoyman writes:

...
sense to try and pursue something like what you're suggesting, but I think the default Show (Vector Word8) should be the one most useful, most of the time, and I think the general consensus seems to be the current ByteString instance fits that role.

Hm. I think it is slightly weird to display a numeric value (Word8) as a Char. Also, I would prefer a representation making the type explicit (but unlike ByteString, vector seems to add a type annotation.) Would you still support the truncating behavior for 'read' and values above 255?

(ByteString has two interfaces, ByteString and .Char8, but as there can be only one Show instance, I see why it works the way it does.)

Perhaps the correct semantic approach would be to have:

newtype Char8 = Char8 Word8

But I think that will break far too many applications to try to get it

would a new Word8 type be better to stay compatible?

newtype Word8 = C8 Data.Word.Word8

I don't think it would really fix much. Any code in the wild right now that refers to Word8 will be referring to Data.Word.Word8. Certainly calling the newtype Word8 will slightly simplify a migration, but (1) it will still require code changes and (2) I'd rather just bite the bullet and make a proper switch. Michael

Christian Maeder

17 Oct 17 Oct

8:18 a.m.

I think the cleanest solution (just from a theoretical point of view) is to use a newtype for your byte strings. - it should have the same performance - allows to make ByteString really abstract when hiding the newtype constructor - is portable and supplies control over all other instances (not just Show) I'm not sure if one could make really bad thinks to your ByteString by using the Vector interface, but one would want to disallow vector operations just for compatible with other byte strings. It would require more work just on your side, though. Cheers Christian Am 15.10.2011 16:15, schrieb Bas van Dijk:

...

On 15 October 2011 13:34, Roman Leshchinskiy wrote:

...
On 15/10/2011, at 12:26, Roman Leshchinskiy wrote:

...
On 14/10/2011, at 12:37, Bas van Dijk wrote:

...
If there's need for a specific Show instance for Vectors of Word8s we can always add one directly to vector. (Roman, what are your thoughts on this?)

Personally, I think that ByteString and especially Vector Word8 aren't strings and shouldn't be treated as such. But I wouldn't be strongly against showing them as strings. However, I *am* strongly against using UndecidableInstances in vector and I don't see how to implement this without using them.

I meant OverlappingInstances, of course. To clarify, I would still consider it if everybody thinks it's a really good idea.

Roman

I agree that you shouldn't use ByteStrings or Vectors of Word8s for Unicode strings. However I can imagine that for quick sessions in ghci it can be quite handy if they are shown as strings. For example, currently we have:

...
import Network.HTTP.Enumerator simpleHttp "http://code.haskell.org/~basvandijk/" Chunk "<html>\n<head><title>Bas van Dijk</title></head>\n<body>\n<h1>Bas van Dijk</h1>\n\n<p>Email:v.dijk.bas@gmail.com</a></p>\n\n<p>Nick on IRC:<tt>basvandijk</tt></p>\n\n http://www.haskellers.com/user/basvandijk/\">\nhttp://www.haskellers.com/static/badge.png\" \n alt=\"I'm a Haskeller\"\n border=\"0\">\n</a>\n\n<p>See myhttps://github.com/basvandijk\">GitHub</a> page for a list of projects I work on.</p>\n\n</body>\n</html>\n" Empty

If ByteStrings were not shown as strings this would look like:

Chunk ( fromList [60,104,116,109,108,62,10,60,104,101,97,100,62,60,116,105,116,108,101,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,116,105,116,108,101,62,60,47,104,101,97,100,62,10,60,98,111,100,121,62,10,60,104,49,62,66,97,115,32,118,97,110,32,68,105,106,107,60,47,104,49,62,10,10,60,112,62,69,109,97,105,108,58,32,60,97,32,104,114,101,102,61,34,109,97,105,108,116,111,58,47,47,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,34,62,118,46,100,105,106,107,46,98,97,115,64,103,109,97,105,108,46,99,111,109,60,47,97,62,60,47,112,62,10,10,60,112,62,78,105,99,107,32,111,110,32,73,82,67,58,32,60,116,116,62,98,97,115,118,97,110,100,105,106,107,60,47,116,116,62,60,47,112,62,10,10,60,97,32,104,114,101,102,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108,108,101,114,115,46,99,111,109,47,117,115,101,114,47,98,97,115,118,97,110,100,105,106,107,47,34,62,10,32,32,60,105,109,103,32,115,114,99,61,34,104,116,116,112,58,47,47,119,119,119,46,104,97,115,107,101,108 ,108,101,114,115,46,99,111,109,47,115,116,97,116,105,99,47,98,97,100,103,101,46,112,110,103,34,32,10,32,32,32,32,32,32,32,97,108,116,61,34,73,39,109,32,97,32,72,97,115,107,101,108,108,101,114,34,10,32,32,32,32,32,32,32,98,111,114,100,101,114,61,34,48,34,62,10,60,47,97,62,10,10,60,112,62,83,101,101,32,109,121,32,60,97,32,104,114,101,102,61,34,104,116,116,112,115,58,47,47,103,105,116,104,117,98,46,99,111,109,47,98,97,115,118,97,110,100,105,106,107,34,62,71,105,116,72,117,98,60,47,97,62,32,112,97,103,101,32,102,111,114,32,97,32,108,105,115,116,32,111,102,32,112,114,111,106,101,99,116,115,32,73,32,119,111,114,107,32,111,110,46,60,47,112,62,10,10,60,47,98,111,100,121,62,10,60,47,104,116,109,108,62,10]) Empty

Personally, I don't work in ghci that often so I don't care that much if we have or don't have specialized Show instances for Vectors of Word8s.

So what do other people think about this?

Bas

Yves Parès

9:03 a.m.

And you could just use GeneralizedNewtypeDeriving extension. Then you could use functions from Data.Vector.Generic on your ByteStrings. Much cleaner IMO than OverlappingInstances. 2011/10/17 Christian Maeder

...

I think the cleanest solution (just from a theoretical point of view) is to use a newtype for your byte strings.

- it should have the same performance - allows to make ByteString really abstract when hiding the newtype constructor - is portable and supplies control over all other instances (not just Show)

I'm not sure if one could make really bad thinks to your ByteString by using the Vector interface, but one would want to disallow vector operations just for compatible with other byte strings.

It would require more work just on your side, though.

Cheers Christian

Am 15.10.2011 16:15, schrieb Bas van Dijk:

On 15 October 2011 13:34, Roman Leshchinskiy>

...
wrote:

...
On 15/10/2011, at 12:26, Roman Leshchinskiy wrote:

On 14/10/2011, at 12:37, Bas van Dijk wrote:

...
If there's need for a specific Show instance for Vectors of Word8s we

...
can always add one directly to vector. (Roman, what are your thoughts on this?)

Personally, I think that ByteString and especially Vector Word8 aren't strings and shouldn't be treated as such. But I wouldn't be strongly against showing them as strings. However, I *am* strongly against using UndecidableInstances in vector and I don't see how to implement this without using them.

I meant OverlappingInstances, of course. To clarify, I would still consider it if everybody thinks it's a really good idea.

Roman

I agree that you shouldn't use ByteStrings or Vectors of Word8s for Unicode strings. However I can imagine that for quick sessions in ghci it can be quite handy if they are shown as strings. For example, currently we have:

import Network.HTTP.Enumerator

...
simpleHttp "http://code.haskell.org/~**basvandijk/http://code.haskell.org/%7Ebasvandijk/ "

Chunk "<html>\n<head><title>Bas van Dijk</title></head>\n<body>\n<**h1>Bas van Dijk</h1>\n\n<p>Email:\"> v.dijk.bas@gmail.**com </a></p>\n\n<p>Nick on IRC:<tt>basvandijk</tt></p>\n\**n http://www.haskellers.**com/user/basvandijk/\http://www.haskellers.com/user/basvandijk/%5C ">\nhttp://www.haskellers.**com/static/badge.png\http://www.haskellers.com/static/badge.png%5C" \n alt=\"I'm a Haskeller\"\n border=\"0\">\n</a>\n\n<p>See myhttps://github.com/**basvandijk\https://github.com/basvandijk%5C">GitHub</a> page for a list of projects I work on.</p>\n\n</body>\n</html>\n" Empty

If ByteStrings were not shown as strings this would look like:

Chunk ( fromList [60,104,116,109,108,62,10,60,**104,101,97,100,62,60,116,105,** 116,108,101,62,66,97,115,32,**118,97,110,32,68,105,106,107,** 60,47,116,105,116,108,101,62,**60,47,104,101,97,100,62,10,60,** 98,111,100,121,62,10,60,104,**49,62,66,97,115,32,118,97,110,** 32,68,105,106,107,60,47,104,**49,62,10,10,60,112,62,69,109,** 97,105,108,58,32,60,97,32,104,**114,101,102,61,34,109,97,105,** 108,116,111,58,47,47,118,46,**100,105,106,107,46,98,97,115,** 64,103,109,97,105,108,46,99,**111,109,34,62,118,46,100,105,** 106,107,46,98,97,115,64,103,**109,97,105,108,46,99,111,109,** 60,47,97,62,60,47,112,62,10,**10,60,112,62,78,105,99,107,32,** 111,110,32,73,82,67,58,32,60,**116,116,62,98,97,115,118,97,** 110,100,105,106,107,60,47,116,**116,62,60,47,112,62,10,10,60,** 97,32,104,114,101,102,61,34,**104,116,116,112,58,47,47,119,** 119,119,46,104,97,115,107,101,**108,108,101,114,115,46,99,111,** 109,47,117,115,101,114,47,98,**97,115,118,97,110,100,105,106,** 107,47,34,62,10,32,32,60,105,**109,103,32,115,114,99,61,34,** 104,116,116,112,58,47,47,119,**119,119,46,104,97,115,107,101,**108

,108,101,114,115,46,99,111,**109,47,115,116,97,116,105,99,** 47,98,97,100,103,101,46,112,**110,103,34,32,10,32,32,32,32,** 32,32,32,97,108,116,61,34,73,**39,109,32,97,32,72,97,115,107,** 101,108,108,101,114,34,10,32,**32,32,32,32,32,32,98,111,114,** 100,101,114,61,34,48,34,62,10,**60,47,97,62,10,10,60,112,62,** 83,101,101,32,109,121,32,60,**97,32,104,114,101,102,61,34,** 104,116,116,112,115,58,47,47,**103,105,116,104,117,98,46,99,** 111,109,47,98,97,115,118,97,**110,100,105,106,107,34,62,71,** 105,116,72,117,98,60,47,97,62,**32,112,97,103,101,32,102,111,** 114,32,97,32,108,105,115,116,**32,111,102,32,112,114,111,106,** 101,99,116,115,32,73,32,119,**111,114,107,32,111,110,46,60,** 47,112,62,10,10,60,47,98,111,**100,121,62,10,60,47,104,116,** 109,108,62,10])

...
Empty

Personally, I don't work in ghci that often so I don't care that much if we have or don't have specialized Show instances for Vectors of Word8s.

So what do other people think about this?

Bas

______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafe http://www.haskell.org/mailman/listinfo/haskell-cafe

Ertugrul Soeylemez

9:10 a.m.

Christian Maeder wrote:

...

I think the cleanest solution (just from a theoretical point of view) is to use a newtype for your byte strings.

- it should have the same performance - allows to make ByteString really abstract when hiding the newtype constructor - is portable and supplies control over all other instances (not just Show)

I'm not sure if one could make really bad thinks to your ByteString by using the Vector interface, but one would want to disallow vector operations just for compatible with other byte strings.

It would require more work just on your side, though.

Also such an implementation wouldn't be big news. You would get stream fusion as news, but I'm specifically excited about the idea that I can use the vector interface. I see no need to restrict the ByteString interface, since it is a pretty low level data structure anyway. You use it to process raw ByteStrings and as such should get maximum flexibility in doing so. Every restriction means that in a certain edge case you can't get high performance, because the author decided that you aren't smart enough to use the underlying interface, something which I always found annoying about some of the Haskell libraries. So please, please, please, if you decide to use a newtype, do /not/ hide the constructor. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Christian Maeder

9:38 a.m.

Am 17.10.2011 11:10, schrieb Ertugrul Soeylemez:

...

So please, please, please, if you decide to use a newtype, do /not/ hide the constructor.

The better alternative to "not hiding the constructor" is to supply conversion functions that may or may not do more than the constructor and selector and are named accordingly. (This just disallows pattern matching.) Cheers Christian

...

Greets, Ertugrul

Ertugrul Soeylemez

1:09 p.m.

Christian Maeder wrote:

...

...
So please, please, please, if you decide to use a newtype, do /not/ hide the constructor.

The better alternative to "not hiding the constructor" is to supply conversion functions that may or may not do more than the constructor and selector and are named accordingly. (This just disallows pattern matching.)

Except annoying library users, what would be the point of that? Please understand that as a middle level developer (abstractions, protocol implementations, frameworks, etc.) I am sometimes annoyed by the idealism of some library interfaces, and I find myself reinventing the wheel very often, because the closed interfaces of some existing libraries just don't support what I need, even though the technical basis would be there, or because of the abstraction forest I'm unable to guarantee or even get good performance. I could totally understand having a black box interface for some higher level stuff, but ByteString is still low/middle level and should support me as a developer on that level. Unifying vector and bytestring sounds like a great step, and I would find it ruined already by wrapping it up in a newtype. Hiding the constructor would make this even worse. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Bas van Dijk

10:14 a.m.

On 17 October 2011 10:18, Christian Maeder wrote:

...

I think the cleanest solution (just from a theoretical point of view) is to use a newtype for your byte strings.

- it should have the same performance - allows to make ByteString really abstract when hiding the newtype constructor

But what would a newtype ByteString = ByteString (Vector Word8) abstract over? What's there to hide? Vectors are already abstract so users can't mess with their internals.

...

- is portable and supplies control over all other instances (not just Show)

What other instances (besides Show) should have different semantics than those of Vector?

...

I'm not sure if one could make really bad thinks to your ByteString by using the Vector interface, but one would want to disallow vector operations just for compatible with other byte strings.

My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector. Bas

Michael Snoyman

10:19 a.m.

On Mon, Oct 17, 2011 at 12:14 PM, Bas van Dijk wrote:

...

On 17 October 2011 10:18, Christian Maeder wrote:

...
I think the cleanest solution (just from a theoretical point of view) is to use a newtype for your byte strings.

- it should have the same performance - allows to make ByteString really abstract when hiding the newtype constructor

But what would a newtype ByteString = ByteString (Vector Word8) abstract over? What's there to hide? Vectors are already abstract so users can't mess with their internals.

...
- is portable and supplies control over all other instances (not just Show)

What other instances (besides Show) should have different semantics than those of Vector?

...
I'm not sure if one could make really bad thinks to your ByteString by using the Vector interface, but one would want to disallow vector operations just for compatible with other byte strings.

My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.

+1. I'm in favor of using the OverlappingInstances/no newtype and specialized Show instance. I think that, if there was *ever* a case where OverlappingInstances was a good fit, it's this one. We're talking about a single module exporting both the base and overlapped instance, so which instance gets used should be completely decidable. (Unless of course someone defines an orphan instance elsewhere, but that's a different issue IMO.) And even in a worst-case-scenario where somehow we get the wrong instance, we're only talking about output used as a debugging aid, so the damage is minimal. Also, aren't there a few documented cases where newtypes prevent certain GHC rewrite rules from firing? I don't see any strong argument to avoid what appears to be the simplest and most straight-forward solution to the problem at hand. Michael

Christian Maeder

12:33 p.m.

Am 17.10.2011 12:19, schrieb Michael Snoyman: [...]

...

Also, aren't there a few documented cases where newtypes prevent certain GHC rewrite rules from firing?

This would be possible to find out with a wrapper module. Cheers Christian

...

I don't see any strong argument to avoid what appears to be the simplest and most straight-forward solution to the problem at hand.

Michael

Roman Leshchinskiy

2:42 p.m.

Michael Snoyman wrote:

...

On Mon, Oct 17, 2011 at 12:14 PM, Bas van Dijk wrote:

...
My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.

+1. I'm in favor of using the OverlappingInstances/no newtype and specialized Show instance. I think that, if there was *ever* a case where OverlappingInstances was a good fit, it's this one. We're talking about a single module exporting both the base and overlapped instance, so which instance gets used should be completely decidable. (Unless of course someone defines an orphan instance elsewhere, but that's a different issue IMO.) And even in a worst-case-scenario where somehow we get the wrong instance, we're only talking about output used as a debugging aid, so the damage is minimal.

So suppose we change the Show and Read instances for Storable vectors of Word8 and Char. What happens with unboxed and boxed vectors of these types? Should these be changed as well? Should these be changed as well? If not, why not? Roman

Michael Snoyman

2:44 p.m.

On Mon, Oct 17, 2011 at 4:42 PM, Roman Leshchinskiy wrote:

...

Michael Snoyman wrote:

...
On Mon, Oct 17, 2011 at 12:14 PM, Bas van Dijk wrote:

...
My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.

+1. I'm in favor of using the OverlappingInstances/no newtype and specialized Show instance. I think that, if there was *ever* a case where OverlappingInstances was a good fit, it's this one. We're talking about a single module exporting both the base and overlapped instance, so which instance gets used should be completely decidable. (Unless of course someone defines an orphan instance elsewhere, but that's a different issue IMO.) And even in a worst-case-scenario where somehow we get the wrong instance, we're only talking about output used as a debugging aid, so the damage is minimal.

So suppose we change the Show and Read instances for Storable vectors of Word8 and Char. What happens with unboxed and boxed vectors of these types? Should these be changed as well? Should these be changed as well? If not, why not?

I don't have any strong opinion on the matter, but it seems like they may as well be changed also. It seems like all the same "useful for debugging" arguments would apply there as well. Michael

Bas van Dijk

3:18 p.m.

On 17 October 2011 16:44, Michael Snoyman wrote:

...

On Mon, Oct 17, 2011 at 4:42 PM, Roman Leshchinskiy wrote:

...
Michael Snoyman wrote:

...
On Mon, Oct 17, 2011 at 12:14 PM, Bas van Dijk wrote:

...
My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.

+1. I'm in favor of using the OverlappingInstances/no newtype and specialized Show instance. I think that, if there was *ever* a case where OverlappingInstances was a good fit, it's this one. We're talking about a single module exporting both the base and overlapped instance, so which instance gets used should be completely decidable. (Unless of course someone defines an orphan instance elsewhere, but that's a different issue IMO.) And even in a worst-case-scenario where somehow we get the wrong instance, we're only talking about output used as a debugging aid, so the damage is minimal.

So suppose we change the Show and Read instances for Storable vectors of Word8 and Char. What happens with unboxed and boxed vectors of these types? Should these be changed as well? Should these be changed as well? If not, why not?

I don't have any strong opinion on the matter, but it seems like they may as well be changed also. It seems like all the same "useful for debugging" arguments would apply there as well.

Michael

Yes I think that makes sense. My patch already adds specific Show and Read instances to all vectors of Chars and Word8s: http://trac.haskell.org/vector/ticket/64 Bas

Jean-Marie Gaillourdet

10:35 a.m.

Hi, On 17.10.2011, at 12:14, Bas van Dijk wrote:

...

On 17 October 2011 10:18, Christian Maeder wrote:

My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.

What about lazy bytestrings? I wasn't aware that vector also supports huge logical array which are suitable for very large io streams. I'd be glad if vector is also suitable for such applications. But if not, then there is still a need for the bytestring package in order to support streaming gigabytes of data in a small constant sized heap. Cheers, Jean

Yves Parès

6:15 p.m.

It's a good question, I don't think there is something in the vector library that can handle chunks of vectors... If both lazy and strict bytestrings are to be generalized, it would *at last * permit to have a single interface to them, thanks to Data.Vector.Generic, and no longer two identical interfaces in separate modules, which forces to duplicate each code which handles bytestrings so that it can deal with the two flavours. 2011/10/17 Jean-Marie Gaillourdet

...

Hi,

On 17.10.2011, at 12:14, Bas van Dijk wrote:

...
On 17 October 2011 10:18, Christian Maeder wrote:

My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.

What about lazy bytestrings? I wasn't aware that vector also supports huge logical array which are suitable for very large io streams. I'd be glad if vector is also suitable for such applications. But if not, then there is still a need for the bytestring package in order to support streaming gigabytes of data in a small constant sized heap.

Cheers, Jean _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Bas van Dijk

7:04 p.m.

On 17 October 2011 20:15, Yves Parès wrote:

...

It's a good question, I don't think there is something in the vector library that can handle chunks of vectors...

Yes I forgot about lazy bytestrings when writing that. Of course vector-bytestring does provide lazy ByteStrings.

...

If both lazy and strict bytestrings are to be generalized, it would at last permit to have a single interface to them, thanks to Data.Vector.Generic, and no longer two identical interfaces in separate modules, which forces to duplicate each code which handles bytestrings so that it can deal with the two flavours.

It would be an interesting idea to add a chunking vector adapter to the vector package. I guess it will look something like this: data Chunks v a = Empty | Chunk {-# UNPACK #-} !(v a) (Chunks v a) foldrChunks :: (v a -> b -> b) -> b -> Chunks v a -> b foldrChunks f z = go where go Empty = z go (Chunk c cs) = f c (go cs) {-# INLINE foldrChunks #-} foldlChunks :: (b -> v a -> b) -> b -> Chunks v a -> b foldlChunks f z = go z where go !a Empty = a go !a (Chunk c cs) = go (f a c) cs {-# INLINE foldlChunks #-} Giving it an instance for Data.Vector.Generic.Base.Vector should be easy right? Anyone up for the job? Then I can replace my custom lazy ByteStrings with: type ByteString = Chunks Vector Word8 Bas

Christian Maeder

11:12 a.m.

Am 17.10.2011 12:14, schrieb Bas van Dijk:

...

On 17 October 2011 10:18, Christian Maeder wrote:

...
I think the cleanest solution (just from a theoretical point of view) is to use a newtype for your byte strings.

- it should have the same performance - allows to make ByteString really abstract when hiding the newtype constructor

But what would a newtype ByteString = ByteString (Vector Word8) abstract over? What's there to hide? Vectors are already abstract so users can't mess with their internals.

Maybe some of the functions that start with "unsafe"? Or why do you use the safe variant (VS.head) in your own implementation? http://hackage.haskell.org/packages/archive/vector/0.9/doc/html/Data-Vector-...

...

...
- is portable and supplies control over all other instances (not just Show)

What other instances (besides Show) should have different semantics than those of Vector?

instance Read (and maybe the vector package will evolve further).

...

...
I'm not sure if one could make really bad thinks to your ByteString by using the Vector interface, but one would want to disallow vector operations just for compatible with other byte strings.

My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.

So your package basically supports an unfortunate mix of bytestring and vector functions? How about proposing a better bytestring interface (if it should not just be that of vector)? Btw. a really abstract bytestring could easily be implemented on top of your package. Cheers Christian

...

Bas

Bas van Dijk

3:26 p.m.

On 17 October 2011 13:12, Christian Maeder wrote:

...

Am 17.10.2011 12:14, schrieb Bas van Dijk:

...
On 17 October 2011 10:18, Christian Maeder wrote:

...
I think the cleanest solution (just from a theoretical point of view) is to use a newtype for your byte strings.

- it should have the same performance - allows to make ByteString really abstract when hiding the newtype constructor

But what would a newtype ByteString = ByteString (Vector Word8) abstract over? What's there to hide? Vectors are already abstract so users can't mess with their internals.

Maybe some of the functions that start with "unsafe"?

But to keep compatible with bytestring's Data.ByteString.Unsafe, I have to export the unsafe functions anyway. I do think we should provide a Data.Vector.Storable.Safe module which only exports the safe interface and mark it Trustworthy using the new Safe Haskell language extensions. Roman: any reason why only storable vectors are missing a Safe module? I could add one this evening, if you like? And should we also export Unsafe modules like how it's done in the base library?

...

http://hackage.haskell.org/packages/archive/vector/0.9/doc/html/Data-Vector-...

...
...
- is portable and supplies control over all other instances (not just Show)

What other instances (besides Show) should have different semantics than those of Vector?

instance Read (and maybe the vector package will evolve further).

...
...
I'm not sure if one could make really bad thinks to your ByteString by using the Vector interface, but one would want to disallow vector operations just for compatible with other byte strings.

My idea is that when vector-bytestring is as fast as bytestring, it can replace it. When that happens it doesn't matter if users use the vector interface. I would even recommend it over using the bytestring interface so that bytestring can eventually be deprecated in favor of vector.

So your package basically supports an unfortunate mix of bytestring and vector functions?

No, vector-bytestring exports the same API as bytestring (except for the Show and Read instances which will hopefully be fixed in a new vector release).

...

How about proposing a better bytestring interface (if it should not just be that of vector)?

I'm all for improving the interface but the goal of vector-bytestring is that it can be used as a drop-in replacement for bytestring without changing to much code. Regards, Bas

Christian Maeder

4:28 p.m.

Am 17.10.2011 17:26, schrieb Bas van Dijk:

...

On 17 October 2011 13:12, Christian Maeder wrote:

...
So your package basically supports an unfortunate mix of bytestring and vector functions?

No, vector-bytestring exports the same API as bytestring (except for the Show and Read instances which will hopefully be fixed in a new vector release).

Yes, but Data.Vector.Storable can be simple imported and used in addition. I suppose, the (derived) Data instances (from vector and the original bytestrings) break the abstraction. (So you must hope nobody is relying on this instance.)

...

...
How about proposing a better bytestring interface (if it should not just be that of vector)?

I'm all for improving the interface but the goal of vector-bytestring is that it can be used as a drop-in replacement for bytestring without changing to much code.

Changing back to another drop-in replacement for bytestring will be difficult if functions from Data.Vector.Storable have been used. Thanks Christian

...

Regards,

Bas

Bas van Dijk

7:12 p.m.

On 17 October 2011 18:28, Christian Maeder wrote:

...

Am 17.10.2011 17:26, schrieb Bas van Dijk:

...
On 17 October 2011 13:12, Christian Maeder wrote:

...
So your package basically supports an unfortunate mix of bytestring and vector functions?

No, vector-bytestring exports the same API as bytestring (except for the Show and Read instances which will hopefully be fixed in a new vector release).

Yes, but Data.Vector.Storable can be simple imported and used in addition.

I consider that an advantage.

...

I suppose, the (derived) Data instances (from vector and the original bytestrings) break the abstraction. (So you must hope nobody is relying on this instance.)

Good point! I will mention that in the documentation of vector-bytestring. Also code using the ByteString constructor PS has to be changed because I obviously can't provide an equivalent. However the documentation of Data.ByteString.Internal (which exports PS) warns "normal" users not to use that module: "A module containing semi-public 'ByteString' internals. This exposes the 'ByteString' representation and low level construction functions. As such all the functions in this module are unsafe. The API is also not stable. Where possible application should instead use the functions from the normal public interface modules, such as "Data.ByteString.Unsafe". Packages that extend the ByteString system at a low level will need to use this module." So I expect not many packages are using the PS constructor directly which means the pain of switching to vectors will be minimal.

...

...
...
How about proposing a better bytestring interface (if it should not just be that of vector)?

I'm all for improving the interface but the goal of vector-bytestring is that it can be used as a drop-in replacement for bytestring without changing to much code.

Changing back to another drop-in replacement for bytestring will be difficult if functions from Data.Vector.Storable have been used.

True, so lets try to make this the final replacement ;-) Regards, Bas

Ertugrul Soeylemez

15 Oct 15 Oct

6:50 p.m.

Roman Leshchinskiy wrote:

...

...
Personally, I think that ByteString and especially Vector Word8 aren't strings and shouldn't be treated as such. But I wouldn't be strongly against showing them as strings. However, I *am* strongly against using UndecidableInstances in vector and I don't see how to implement this without using them.

I meant OverlappingInstances, of course. To clarify, I would still consider it if everybody thinks it's a really good idea.

My suggestion was to remove the generic Show instance and add only specialized instances. This is more work, but will also yield better results. In particular, it allows specialized string representations for other types, too. For example the way values of type Vector Bool are printed is extremely useless. I always find myself writing my own debugging output functions for boolean vectors. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Bas van Dijk

8:25 p.m.

On 15 October 2011 20:50, Ertugrul Soeylemez wrote:

...

Roman Leshchinskiy wrote:

...
...
Personally, I think that ByteString and especially Vector Word8 aren't strings and shouldn't be treated as such. But I wouldn't be strongly against showing them as strings. However, I *am* strongly against using UndecidableInstances in vector and I don't see how to implement this without using them.

I meant OverlappingInstances, of course. To clarify, I would still consider it if everybody thinks it's a really good idea.

My suggestion was to remove the generic Show instance and add only specialized instances. This is more work, but will also yield better results. In particular, it allows specialized string representations for other types, too.

What exactly is the problem with using OverlappingInstances to define specialized Show and Read instances for Vectors with certain element types (Char, Word8, Bool)? Am I missing something dangerous here? Bas

Ertugrul Soeylemez

9:17 p.m.

Bas van Dijk wrote:

...

...
My suggestion was to remove the generic Show instance and add only specialized instances. This is more work, but will also yield better results. In particular, it allows specialized string representations for other types, too.

What exactly is the problem with using OverlappingInstances to define specialized Show and Read instances for Vectors with certain element types (Char, Word8, Bool)?

Am I missing something dangerous here?

Consider having the following instances: instance Show a => Show (Vector a) instance Show (Vector Word8) How could the compiler determine, which instance you want, when saying show someVector where someVector :: Vector Word8? Both instances are valid here, and there is no mechanism to choose one of them. You can only write a generic instance, where you can rule out the specialized instances. I don't think that's possible in this case. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Bas van Dijk

10:56 p.m.

On 15 October 2011 23:17, Ertugrul Soeylemez wrote:

...

Both instances are valid here, and there is no mechanism to choose one of them.

There is: OverlappingInstances[1] chooses the most specific instance. So in case someVector :: Vector Word8 the instance Show (Vector Word8) is chosen because it's the most specific. Bas [1] http://www.haskell.org/ghc/docs/latest/html/users_guide/type-class-extension...

Stephen Tetley

16 Oct 16 Oct

6:51 a.m.

On 15 October 2011 23:56, Bas van Dijk wrote:

...

On 15 October 2011 23:17, Ertugrul Soeylemez wrote:

...
Both instances are valid here, and there is no mechanism to choose one of them.

There is: OverlappingInstances[1] chooses the most specific instance. So in case someVector :: Vector Word8 the instance Show (Vector Word8) is chosen because it's the most specific.

This has the problem of incoherence in multi-module programs - GHC might chose different instances for the same type depending on compilation order. For a Show instance, this may be acceptable.

Bas van Dijk

9:06 a.m.

On 16 October 2011 08:51, Stephen Tetley wrote:

...

On 15 October 2011 23:56, Bas van Dijk wrote:

...
On 15 October 2011 23:17, Ertugrul Soeylemez wrote:

...
Both instances are valid here, and there is no mechanism to choose one of them.

There is: OverlappingInstances[1] chooses the most specific instance. So in case someVector :: Vector Word8 the instance Show (Vector Word8) is chosen because it's the most specific.

This has the problem of incoherence in multi-module programs - GHC might chose different instances for the same type depending on compilation order. For a Show instance, this may be acceptable.

But is this a problem when both instances are exported from the same module and OverlappingInstances is only enabled in that module, as is the case here?

Stephen Tetley

2:01 p.m.

On 16 October 2011 10:06, Bas van Dijk wrote:

...

But is this a problem when both instances are exported from the same module and OverlappingInstances is only enabled in that module, as is the case here?

No - if the only instances defined are in the same module GHC would pick the most specific one. If there was no instance for Ivan's use-case of Vector Word8 in the "offical" module, and he chose to define this more specific instance elsewhere there is the potential for incoherence.

Ertugrul Soeylemez

2:43 p.m.

Bas van Dijk wrote:

...

On 15 October 2011 23:17, Ertugrul Soeylemez wrote:

...
Both instances are valid here, and there is no mechanism to choose one of them.

There is: OverlappingInstances[1] chooses the most specific instance. So in case someVector :: Vector Word8 the instance Show (Vector Word8) is chosen because it's the most specific.

Although I don't have a problem with using language extensions the vector package, as it is a commonly used tool, shouldn't require me to use an extension just to be able to debug my code. This would be particularly annoying when using GHCi, because you would always have to start it with an extension option. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Felipe Almeida Lessa

4:26 p.m.

On Sun, Oct 16, 2011 at 12:43 PM, Ertugrul Soeylemez wrote:

...

Although I don't have a problem with using language extensions the vector package, as it is a commonly used tool, shouldn't require me to use an extension just to be able to debug my code. This would be particularly annoying when using GHCi, because you would always have to start it with an extension option.

You don't need it. The OverlappingInstances extension needs to be enabled only where the ovarlapping instances are defined, *not* where they're used. =) Cheers, -- Felipe.

Ertugrul Soeylemez

5:58 p.m.

Felipe Almeida Lessa wrote:

...

...
Although I don't have a problem with using language extensions the vector package, as it is a commonly used tool, shouldn't require me to use an extension just to be able to debug my code. This would be particularly annoying when using GHCi, because you would always have to start it with an extension option.

You don't need it. The OverlappingInstances extension needs to be enabled only where the ovarlapping instances are defined, *not* where they're used. =)

I see. Then I'm totally fine with it. =) Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Christian Maeder

18 Oct 18 Oct

1:45 p.m.

Am 12.10.2011 16:02, schrieb Bas van Dijk:

...

API DOCS

http://hackage.haskell.org/package/vector-bytestring-0.0.0.0

you could re-export VS.empty, VS.singleton, etc. directly. Cheers Christian -- | /O(1)/ The empty 'ByteString' empty :: ByteString empty = VS.empty {-# INLINE empty #-} -- | /O(1)/ Convert a 'Word8' into a 'ByteString' singleton :: Word8 -> ByteString singleton = VS.singleton {-# INLINE [1] singleton #-} -- Inline [1] for intercalate rule

Roel van Dijk

2:18 p.m.

2011/10/18 Christian Maeder :

...

you could re-export VS.empty, VS.singleton, etc. directly.

The vector singleton and the vector-bytestring singleton don't have the same type. vector:

...

singleton :: a -> Vector a

vector-bytestring:

...

singleton :: Word8 -> Vector Word8

By choosing the more general type you risk that a previously correct program becomes ambiguous. (When migrating from bytestring to vector-bytestring). I'm not sure if this will actually occur in practive or that it holds for all the little functions that you could theoretically re-export directly. Maybe we create an example program which would fail with the more general type. Proving the opposite (that the more general type is always safe) will be more difficult.

Roel van Dijk

2:26 p.m.

2011/10/18 Roel van Dijk :

...

Maybe we [can] create an example program which would fail with the more general type.

Migrating the function "foo" from bytestring to vector-bytestring would fail with more general types:

...

import Data.ByteString foo = print empty Ok, modules loaded: Test.

With vector:

...

import Data.Vector.Storable foo = print empty Ambiguous type variable `a0' in the constraints: (Show a0) arising from a use of `print' at /home/roelvandijk/development/test.hs:5:7-11 (Storable a0) arising from a use of `empty' at /home/roelvandijk/development/test.hs:5:13-17 Probable fix: add a type signature that fixes these type variable(s) In the expression: print empty In an equation for `foo': foo = print empty Failed, modules loaded: none.

5004

Age (days ago)

5013

Last active (days ago)

List overview

Download

57 comments

18 participants

participants (18)

AM
Bas van Dijk
Christian Maeder
Conrad Parker
Ertugrul Soeylemez
Evan Laforge
Felipe Almeida Lessa
Ivan Lazar Miljenovic
Jean-Marie Gaillourdet
Joachim Breitner
Ketil Malde
Max Rabkin
Michael Snoyman
Roel van Dijk
Roman Leshchinskiy
Stephen Tetley
Vincent Hanquez
Yves Parès