The Proliferation of List-Like Types

older
Re: [Haskell-cafe] Can't seem to...

John Goerzen

20 Feb 2008 20 Feb '08

2:39 p.m.

Hi folks, Before I started using Haskell, I used OCaml for a spell. One of my biggest annoyances with OCaml was that it had two list types: the default list (strict), and a lazy list (known as a stream). This led to all sorts of annoyances. Libraries were always written to work with one list or the other. If you wanted to use two libraries, one that assumed one list and the other that assumed another, you had a difficult task ahead of you. I am concerned that the same thing is happening in Haskell. We know have three common list-like types: the regular list, strict ByteString, and lazy ByteString. This has created some annoying situations. For instance, a ByteString is great for reading data fast, but Parsec doesn't work on ByteStrings. I am glad that someone wrote a Parsec equivolent that does[1], which answers a real need. But this means that all the combinators in the hsemail library that implement standard RFC conventions won't be usable in my ByteString code, for instance. Similarly, we have another annoying situation relating to character encodings: * The iconv library works only on lazy ByteStrings, and does not handle Strings or strict ByteStrings * The utf8-string library doesn't support UTF-16, doesn't require an external library, and works only on Strings -- no support for ByteStrings. * Data.Encoding.* is native haskell, supports UTF-*, but works only on ByteSting.Lazy again. Now, to help solve this problem, I wrote ListLike[2], providing a set of typeclasses that make list operations generic. I also provided default instances of ListLike for: ListLike Data.ByteString.ByteString Word8 ListLike Data.ByteString.Lazy.ByteString Word8 ListLike [a] a (Integral i, Ix i) => ListLike (Array i e) e (Ord key, Eq val) => ListLike (Map key val) (key, val) These instances use the native underlying calls where appropriate (for instance, ByteString and Data.List both provide a 'head'). The typeclass also contains large numbers of default implementations, such that only four functions must be implemented to make a type a member of ListLike. API ref is at http://software.complete.org/listlike/static/doc/ListLike/Data-ListLike.html... Now, the questions: 1) Does everyone agree with me that we have a problem here? 2) Would it make sense to make ListLike, or something like it, part of the Haskell core? 3) Would it make sense to base as much code as possible in the Haskell core areound ListLike definitions? Here I think of functions such as lines and words, which make sense both on [Char] as well as ByteStrings. 4) We are missing one final useful type: a Word32-based ByteString. When working in the Unicode character set, a 32-bit character can indeed be useful, and I could see situations in which the performance benefit of a ByteString-like implementation could be useful combared to [Char]. [1] Yes, I have read about Parsec 3 being imminent, which is also great [2] http://software.complete.org/listlike

Show replies by date

Henning Thielemann

20 Feb 20 Feb

3:17 p.m.

...

Now, to help solve this problem, I wrote ListLike[2], providing a set of typeclasses that make list operations generic. I also provided default instances of ListLike for:

ListLike Data.ByteString.ByteString Word8 ListLike Data.ByteString.Lazy.ByteString Word8 ListLike [a] a (Integral i, Ix i) => ListLike (Array i e) e (Ord key, Eq val) => ListLike (Map key val) (key, val)

It's a multi-parameter type class, right? So it's difficult to push it to the core.

...

Now, the questions:

1) Does everyone agree with me that we have a problem here?

I agree.

...

2) Would it make sense to make ListLike, or something like it, part of the Haskell core?

Somehow yes. However since the 'base' package is constantly split into smaller parts, there is maybe no need to merge it somewhere, but introduce simply new package dependencies.

...

3) Would it make sense to base as much code as possible in the Haskell core areound ListLike definitions? Here I think of functions such as lines and words, which make sense both on [Char] as well as ByteStrings.

4) We are missing one final useful type: a Word32-based ByteString. When working in the Unicode character set, a 32-bit character can indeed be useful, and I could see situations in which the performance benefit of a ByteString-like implementation could be useful combared to [Char].

StorableVector should fill this gap. http://code.haskell.org/~sjanssen/storablevector/

Chad Scherrer

6:48 p.m.

Henning Thielemann writes:

...

...
4) We are missing one final useful type: a Word32-based ByteString. When working in the Unicode character set, a 32-bit character can indeed be useful, and I could see situations in which the performance benefit of a ByteString-like implementation could be useful combared to [Char].

StorableVector should fill this gap. http://code.haskell.org/~sjanssen/storablevector/

Yes, it could, but (1) it's way behind ByteString in terms of optimizations (== "fusion") (2) there's (as far as I know) not a StorableVector.Lazy, which is very much needed To catch up on both fronts, we're looking at a lot of duplicate code. Chad

Antoine Latter

6:57 p.m.

On Feb 20, 2008 12:48 PM, Chad Scherrer wrote:

...

...
StorableVector should fill this gap. http://code.haskell.org/~sjanssen/storablevector/

Yes, it could, but (1) it's way behind ByteString in terms of optimizations (== "fusion") (2) there's (as far as I know) not a StorableVector.Lazy, which is very much needed

To catch up on both fronts, we're looking at a lot of duplicate code.

For anyone looking into it - the StorableVector fusion would have to be quite different from the current ByteString fusion framework. Maybe it would be enough to lay down a Stream fusion framework for StorableVectors. -Antoine

Chad Scherrer

6:59 p.m.

On Feb 20, 2008 10:57 AM, Antoine Latter wrote:

...

For anyone looking into it - the StorableVector fusion would have to be quite different from the current ByteString fusion framework. Maybe it would be enough to lay down a Stream fusion framework for StorableVectors.

I must be missing something. Why would it have to be so different? Chad

Antoine Latter

7:20 p.m.

On Feb 20, 2008 12:59 PM, Chad Scherrer wrote:

...

On Feb 20, 2008 10:57 AM, Antoine Latter wrote:

...
For anyone looking into it - the StorableVector fusion would have to be quite different from the current ByteString fusion framework. Maybe it would be enough to lay down a Stream fusion framework for StorableVectors.

I must be missing something. Why would it have to be so different?

...

From what I saw of Data.ByteString.Fusion, it relies on the assumption

that the elements of the output array are of the same size and alignment as the elements of all of the arrays in the fused intermediate steps. That way, all of the intermediate stages can mutate the output array in place. This works because all of the fusable bytestring functions have types along the lines of: map :: (Word8 -> Word8) -> ByteString -> ByteString With StorableVector, it'd be nice to support the fusion of: map :: (a -> b) -> Vector a -> Vector b All of this just comes from me reading the code, so I could be miss-interpreting something. The NDP papers probably have something interesting to say about this, but I haven't taken the time to try and understand/simplify what they do. -Antoine

Chad Scherrer

8:11 p.m.

Antoine Latter wrote:

...

From what I saw of Data.ByteString.Fusion, it relies on the assumption that the elements of the output array are of the same size and alignment as the elements of all of the arrays in the fused intermediate steps. That way, all of the intermediate stages can mutate the output array in place.

I see a lot in there involving the elimination of intermediate data structures, but nothing about destructive updates. The API is purely functional, and what you're talking about would need to be done in the IO monad to be sure you don't throw away stuff you might need to use again. Chad

Henning Thielemann

21 Feb 21 Feb

3:57 a.m.

On Wed, 20 Feb 2008, Chad Scherrer wrote:

...

On Feb 20, 2008 10:57 AM, Antoine Latter wrote:

...
For anyone looking into it - the StorableVector fusion would have to be quite different from the current ByteString fusion framework. Maybe it would be enough to lay down a Stream fusion framework for StorableVectors.

I must be missing something. Why would it have to be so different?

I think there can also be problems simply because the element type is no longer fixed to Word8 but also not entirely free, but restricted to Storable. E.g. you cannot simply replace SV.fromList . List.map f by SV.map f . SV.fromList because in the second form not only the result type of 'f' must be Storable, but the input type of 'f' must be Storable, too.

Chad Scherrer

6:23 p.m.

On Wed, Feb 20, 2008 at 7:57 PM, Henning Thielemann wrote:

...

I think there can also be problems simply because the element type is no longer fixed to Word8 but also not entirely free, but restricted to Storable. E.g. you cannot simply replace SV.fromList . List.map f by SV.map f . SV.fromList because in the second form not only the result type of 'f' must be Storable, but the input type of 'f' must be Storable, too.

Hmm, interesting. But would we really need this? If we have [a] rewritten as a stream and SV rewritten as a stream, couldn't they still fuse? Loosely speaking, SV.fromList . List.map f -> (SV.unstream . List.stream) . (List.unstream . mapS f . List.stream) -> SV.unstream . mapS f . List.stream Chad

Ross Paterson

20 Feb 20 Feb

3:44 p.m.

On Wed, Feb 20, 2008 at 08:39:13AM -0600, John Goerzen wrote:

...

I am concerned that the same thing is happening in Haskell. We now have three common list-like types: the regular list, strict ByteString, and lazy ByteString.

This has created some annoying situations. For instance, a ByteString is great for reading data fast, but Parsec doesn't work on ByteStrings. I am glad that someone wrote a Parsec equivalent that does[1], which answers a real need. But this means that all the combinators in the hsemail library that implement standard RFC conventions won't be usable in my ByteString code, for instance. [...] http://software.complete.org/listlike/static/doc/ListLike/Data-ListLike.html

As Henning pointed out, multiple parameter type classes are problematic for core libraries at present. An alternative might be explicit dictionaries. For example, a partial solution would be to provide coinductive views, i.e. for all these types to provide functions of a type like full -> Maybe (item, full) (Data.Map, Data.Set and Data.Sequence would each have two such functions), and to have a library of generalized functions taking such functions as parameters, like splitAt :: (full -> Maybe (item, full)) -> Int -> full -> ([item], full) Parsing libraries could include a similar parameter within their monad. That only covers the elimination side, of course.

John Goerzen

4:23 p.m.

On 2008-02-20, Ross Paterson wrote:

...

...
conventions won't be usable in my ByteString code, for instance. [...] http://software.complete.org/listlike/static/doc/ListLike/Data-ListLike.html

As Henning pointed out, multiple parameter type classes are problematic for core libraries at present.

An alternative might be explicit dictionaries. For example, a partial solution would be to provide coinductive views, i.e. for all these types to provide functions of a type like

full -> Maybe (item, full)

Hrm, what exactly is the return data here? Is is the head and the tail if the list has >= 1 item, or Nothing otherwise? Or...? The problem with this approach, if my guess is correct, is that you can't achieve native speed because you will have to be re-implementing everything in terms of these functions. For instance, I'd be using a re-implementation of length instead of a native ByteString length, which may be much faster. I notice that Data.Foldable does some similar things but does not use multi-parameter type classes. I seem to recall that I attempted to do this in the same manner, but got tripped up somewhere. I can't remember now exactly what the problem was, but I can go back and look if nobody knows off-hand. What is the problem with MPTC in base?

John Goerzen

4:28 p.m.

On 2008-02-20, John Goerzen wrote:

...

I notice that Data.Foldable does some similar things but does not use multi-parameter type classes. I seem to recall that I attempted to do this in the same manner, but got tripped up somewhere. I can't remember now exactly what the problem was, but I can go back and look if nobody knows off-hand.

I went back and looked. The problem is that ByteString doesn't work as a member of Foldable, or of ListLike without it being MPTC. Trying to do so yields: ListLike.hs:217:20: Kind mis-match Expected kind `* -> *', but `BS.ByteString' has kind `*' In the instance declaration for `F.Foldable BS.ByteString' Is there any way around that, other than MPTC? -- John

Jules Bean

5:55 p.m.

John Goerzen wrote:

...

On 2008-02-20, John Goerzen wrote:

...
I notice that Data.Foldable does some similar things but does not use multi-parameter type classes. I seem to recall that I attempted to do this in the same manner, but got tripped up somewhere. I can't remember now exactly what the problem was, but I can go back and look if nobody knows off-hand.

I went back and looked.

The problem is that ByteString doesn't work as a member of Foldable, or of ListLike without it being MPTC. Trying to do so yields:

ListLike.hs:217:20: Kind mis-match Expected kind `* -> *', but `BS.ByteString' has kind `*' In the instance declaration for `F.Foldable BS.ByteString'

Is there any way around that, other than MPTC?

Not directly, no. The point about Foldable, Functor, and Monad, is that they enforce the connection between container and contents. If the contents is of type "a", the container is of type "f a" for a fixed type constructor 'f'. This works for [], Seq, and so on, but fails for ByteString. To go to the next level, for ByteString you either need type-level functions (to generalise 'f' from "type constructor" to "arbitrary function :: * -> *"), or MPTCs (to make the association between container and contents explicit). However, passing around dictionaries is certainly a solution which works in haskell98. I haven't thought it through enough to see if it would be unpleasantly verbose in practice. Jules

John Goerzen

6:43 p.m.

On 2008-02-20, Jules Bean wrote:

...

Not directly, no.

The point about Foldable, Functor, and Monad, is that they enforce the connection between container and contents. If the contents is of type "a", the container is of type "f a" for a fixed type constructor 'f'. This works for [], Seq, and so on, but fails for ByteString.

Right. In a pure abstract sense, we humans know there is a relationship between container and contents: a ByteString always contains a Word8 (or a Char8 if we choose the alternative implementation). But that is not expressed in the type of ByteString.

...

However, passing around dictionaries is certainly a solution which works in haskell98. I haven't thought it through enough to see if it would be unpleasantly verbose in practice.

I'm not sure precisely what you mean here. If you mean to use dictionaries instead of typeclasses entirely, yes of course that would work, but it would mean that the functions could not operate on the underlying types unmodified, and once again compatibility issues may arise. On the other hand, if you mean using a dictionary to "wrap" just the ByteString types (or other similar ones), I am currently thinking of something along those lines. I'll post here if I come up with something clever (or not).

Ben Franksen

10:18 p.m.

John Goerzen wrote:

...

On 2008-02-20, Jules Bean wrote:

...
Not directly, no.

The point about Foldable, Functor, and Monad, is that they enforce the connection between container and contents. If the contents is of type "a", the container is of type "f a" for a fixed type constructor 'f'. This works for [], Seq, and so on, but fails for ByteString.

Right. In a pure abstract sense, we humans know there is a relationship between container and contents: a ByteString always contains a Word8 (or a Char8 if we choose the alternative implementation).

But that is not expressed in the type of ByteString.

Hm, making a function out of a constant is easy on the value level, just use (const x) instead of (x). So, what about wrapping ByteString in a GADT, like this data ByteString' a where BS' :: Word8 -> ByteString' Word8 ? I probably overlooked something important here... Cheers Ben

David Roundy

10:28 p.m.

On Wed, Feb 20, 2008 at 11:18:51PM +0100, Ben Franksen wrote:

...

John Goerzen wrote:

...
On 2008-02-20, Jules Bean wrote:

...
Not directly, no.

The point about Foldable, Functor, and Monad, is that they enforce the connection between container and contents. If the contents is of type "a", the container is of type "f a" for a fixed type constructor 'f'. This works for [], Seq, and so on, but fails for ByteString.

Right. In a pure abstract sense, we humans know there is a relationship between container and contents: a ByteString always contains a Word8 (or a Char8 if we choose the alternative implementation).

But that is not expressed in the type of ByteString.

Hm, making a function out of a constant is easy on the value level, just use (const x) instead of (x). So, what about wrapping ByteString in a GADT, like this

data ByteString' a where BS' :: Word8 -> ByteString' Word8

? I probably overlooked something important here...

The problem is that while this would change the kind of ByteString to the same as the kind expected by Functor, you still couldn't define a proper Functor instance, since only ByteString' Word8 can ever actually be created. i.e. how could you implement fmapBS :: (a -> b) -> ByteString' a -> ByteString' b -- David Roundy Department of Physics Oregon State University

Ben Franksen

10:40 p.m.

David Roundy wrote:

...

On Wed, Feb 20, 2008 at 11:18:51PM +0100, Ben Franksen wrote:

...
John Goerzen wrote:

...
On 2008-02-20, Jules Bean wrote:

...
Not directly, no.

The point about Foldable, Functor, and Monad, is that they enforce the connection between container and contents. If the contents is of type "a", the container is of type "f a" for a fixed type constructor 'f'. This works for [], Seq, and so on, but fails for ByteString.

Right. In a pure abstract sense, we humans know there is a relationship between container and contents: a ByteString always contains a Word8 (or a Char8 if we choose the alternative implementation).

But that is not expressed in the type of ByteString.

Hm, making a function out of a constant is easy on the value level, just use (const x) instead of (x). So, what about wrapping ByteString in a GADT, like this

data ByteString' a where BS' :: Word8 -> ByteString' Word8

? I probably overlooked something important here...

The problem is that while this would change the kind of ByteString to the same as the kind expected by Functor, you still couldn't define a proper Functor instance, since only ByteString' Word8 can ever actually be created. i.e. how could you implement

fmapBS :: (a -> b) -> ByteString' a -> ByteString' b

Oh yes, indeed. I knew there would be a catch, somewhere... Cheers Ben

John Goerzen

10:45 p.m.

On 2008-02-20, John Goerzen wrote:

...

On the other hand, if you mean using a dictionary to "wrap" just the ByteString types (or other similar ones), I am currently thinking of something along those lines. I'll post here if I come up with something clever (or not).

Can't come up with anything particularly clever here. I think if we go that route, our only option is to add some wrapping/dewrapping function to encapsulate a ByteString into some sort of BSWrapper that does this sort of thing. Not exceptionally convenient. -- John

Neil Mitchell

4:31 p.m.

...

...
full -> Maybe (item, full)

Hrm, what exactly is the return data here? Is is the head and the tail if the list has >= 1 item, or Nothing otherwise? Or...?

Yes, its the projection onto another type: [] = Nothing (x:xs) = Just (x, xs)

...

What is the problem with MPTC in base?

MPTC is not a part of any Haskell standard. The rules surrounding MPTC are not clear. People want to remove MPTC's/functional dependencies, or modify them with associated types. Compilers such as nhc and yhc can't implement them. Once they are in Haskell', with an associated set of restrictions/overlap rules etc, then they can be freely used with the base library. Thanks Neil

ajb＠spamcop.net

21 Feb 21 Feb

3:46 a.m.

G'day all. Quoting Neil Mitchell :

...

Yes, its the projection onto another type:

[] = Nothing (x:xs) = Just (x, xs)

Also known as msplit: http://www.haskell.org/haskellwiki/New_monads/MonadSplit Cheers, Andrew Bromage

David Menendez

7:05 a.m.

On Wed, Feb 20, 2008 at 10:46 PM, wrote:

...

Quoting Neil Mitchell :

...
Yes, its the projection onto another type:

[] = Nothing (x:xs) = Just (x, xs)

Also known as msplit:

http://www.haskell.org/haskellwiki/New_monads/MonadSplit

Almost. The projection has type f a -> Maybe (a, f a), but msplit has type f a -> f (Maybe (a, f a)). -- Dave Menendez http://www.eyrie.org/~zednenem/

Duncan Coutts

20 Feb 20 Feb

11:13 p.m.

On Wed, 2008-02-20 at 08:39 -0600, John Goerzen wrote:

...

* The iconv library works only on lazy ByteStrings, and does not handle Strings or strict ByteStrings

There is a very good reason for this. The right solution in this particular example is not to overload every internal string operation in the iconv lib (which would be far far too slow) but to convert to/from your favourite representation on the edge. So in this case those conversions would be pack/unpack or the similar equivalents for strict <-> lazy bytestrings. If we want it to be generic then we want a class of string like things that provides conversions only, not operations. For example we could export iconv as: iconv :: StringLike string => Encoding -> Encoding -> string -> string iconv to from = (convertStringRep :: Lazy.ByteString -> string) . theRealIconv . (convertStringRep :: string -> Lazy.ByteString) class StringLike string where ... convertStringRep :: (StringLike s1, StringLike s2) => s1 -> s2 -- analogous to fromIntegral Duncan

John Goerzen

21 Feb 21 Feb

1:01 a.m.

On Wednesday 20 February 2008 5:13:34 pm Duncan Coutts wrote:

...

On Wed, 2008-02-20 at 08:39 -0600, John Goerzen wrote:

...
* The iconv library works only on lazy ByteStrings, and does not handle Strings or strict ByteStrings

There is a very good reason for this. The right solution in this particular example is not to overload every internal string operation in the iconv lib (which would be far far too slow) but to convert to/from

I guess the first question here is: in general, why? Let's say you were using something like ListLike (or StringLike, see below). If a library used these operations exclusively, you could make it work on most any type of list by simply changing your imports. (Hide the regular functions from Prelude, and import ListLike). For types such as ByteStrings or lists, that already have a very rich native implementation of these functions, the native implementation is used. You should be getting greater compatibility essentially for free. ListLike is an exhaustive mapping over these native functions. This would be great for anything from sort algorithms to parsers, etc. I even have a ListLikeIO typeclass[2] to facilitate this. [2] Now in your iconv case, you have a special case because you are doing manipulation specifically upon 8-bit binary data. It may not make sense for you to support a [Char] or even a Char8 ByteString because it does not lend itself to those very well. You could, perhaps, support a [Word8] as well as a ByteString when using ListLike. That is, you may have a function like this: head :: ListLike full Word8 => full -> Word8 You could still use this with a ByteString at native speeds, and a [Word8] at its native speed. But this doesn't buy us the ability to use this library interchangably with a Word8-based ByteString and a [Char]. That is a scenario ListLike is not designed to help with. ListLike is designed to make the container interchangable, but does not address making the contents interchangable. I think this is what you are pointing out?

...

your favourite representation on the edge. So in this case those conversions would be pack/unpack or the similar equivalents for strict <-> lazy bytestrings.

If we want it to be generic then we want a class of string like things that provides conversions only, not operations.

For example we could export iconv as:

iconv :: StringLike string => Encoding -> Encoding -> string -> string iconv to from = (convertStringRep :: Lazy.ByteString -> string) . theRealIconv . (convertStringRep :: string -> Lazy.ByteString)

class StringLike string where ...

convertStringRep :: (StringLike s1, StringLike s2) => s1 -> s2 -- analogous to fromIntegral

ListLike has something along these lines, too: [1] class StringLike s where toString :: s -> String fromString :: String -> s lines :: ListLike full s => s -> full words :: ListLike full s => s -> full unlines :: ListLike full s => full -> s unwords :: ListLike full s => full -> s The last four functions are there as a way to provide a universal interface to optimized native functions, where available. The minimal complete definition is just toString and fromString. Technically, you could make every function look like: iconv data = fromString . toString $ realIConv (fromString . toString $ data) That can be made simpler for the programmer with a helper function, but is not necessarily very efficient. With an MPTC, we could do: class StringConvertable s1 s2 where convertString :: s1 -> s2 withString :: s1 -> (s2 -> a) -> a withString x func = func (convertString x) interactString :: s1 -> (s2 -> s2) -> s1 interactString x func = convertString (func (convertString x) which ought to make easy conversions (String to ByteString, for instance) easily doable for a library such as iconv. What do you think? [1] http://software.complete.org/listlike/static/doc/ListLike/Data-ListLike-Stri... [2] http://software.complete.org/listlike/static/doc/ListLike/Data-ListLike-IO.h...

...

Duncan

Duncan Coutts

10:21 a.m.

On Wed, 2008-02-20 at 19:01 -0600, John Goerzen wrote:

...

On Wednesday 20 February 2008 5:13:34 pm Duncan Coutts wrote:

...
On Wed, 2008-02-20 at 08:39 -0600, John Goerzen wrote:

...
* The iconv library works only on lazy ByteStrings, and does not handle Strings or strict ByteStrings

There is a very good reason for this. The right solution in this particular example is not to overload every internal string operation in the iconv lib (which would be far far too slow) but to convert to/from

I guess the first question here is: in general, why?

If one is stitching together coarse grained operations then it doesn't matter too much that we pass in a dictionary and indirect every operating through that. When we're using very fine grained operations the overhead per-operation is more significant. If we do not specialise to the list type we get all the extra overhead and we loose out on all the static optimisations. If we do specialise we get N copies of the code. Iconv is a weird example because it is calling out to a foreign lib that requires blocks of elements. A more convincing example might be an xml parser. I conjecture that it is faster and uses less code to make a single implementation on the best string representation and convert at the boundaries than to overload all the operations inside the parser by the string type. My guess is that a well written xml parser over lazy bytestring will be about the same speed as one written one over String *even* if we have to initially convert from a String input and certainly faster than an xml parser that takes a StringLike dictionary at runtime. So I'm claiming that the single impl with boundary conversion gives us the best of both worlds, no code bloat due to specialisation and working with whichever string type you like, by converting it at the beginning and end. Of course only an experiment can say either way.

...

Let's say you were using something like ListLike (or StringLike, see below). If a library used these operations exclusively, you could make it work on most any type of list by simply changing your imports. (Hide the regular functions from Prelude, and import ListLike). For types such as ByteStrings or lists, that already have a very rich native implementation of these functions, the native implementation is used. You should be getting greater compatibility essentially for free. ListLike is an exhaustive mapping over these native functions. This would be great for anything from sort algorithms to parsers, etc. I even have a ListLikeIO typeclass[2] to facilitate this. [2]

Now in your iconv case, you have a special case because you are doing manipulation specifically upon 8-bit binary data. It may not make sense for you to support a [Char] or even a Char8 ByteString because it does not lend itself to those very well. You could, perhaps, support a [Word8] as well as a ByteString when using ListLike. That is, you may have a function like this:

head :: ListLike full Word8 => full -> Word8

You could still use this with a ByteString at native speeds, and a [Word8] at its native speed.

Only if we definitely eliminate the type class dictionary statically and pay the code bloat cost of having several implementations around.

...

But this doesn't buy us the ability to use this library interchangably with a Word8-based ByteString and a [Char]. That is a scenario ListLike is not designed to help with. ListLike is designed to make the container interchangable, but does not address making the contents interchangable.

I think this is what you are pointing out?

Not really. The iconv example would work with any ListLike(withCString) whose contents are Word8.

...

...
your favourite representation on the edge. So in this case those conversions would be pack/unpack or the similar equivalents for strict <-> lazy bytestrings.

If we want it to be generic then we want a class of string like things that provides conversions only, not operations.

For example we could export iconv as:

iconv :: StringLike string => Encoding -> Encoding -> string -> string iconv to from = (convertStringRep :: Lazy.ByteString -> string) . theRealIconv . (convertStringRep :: string -> Lazy.ByteString)

class StringLike string where ...

convertStringRep :: (StringLike s1, StringLike s2) => s1 -> s2 -- analogous to fromIntegral

ListLike has something along these lines, too: [1]

class StringLike s where toString :: s -> String fromString :: String -> s lines :: ListLike full s => s -> full words :: ListLike full s => s -> full unlines :: ListLike full s => full -> s unwords :: ListLike full s => full -> s

The last four functions are there as a way to provide a universal interface to optimized native functions, where available. The minimal complete definition is just toString and fromString.

Technically, you could make every function look like:

iconv data = fromString . toString $ realIConv (fromString . toString $ data)

...

That can be made simpler for the programmer with a helper function, but is not necessarily very efficient. With an MPTC, we could do:

class StringConvertable s1 s2 where convertString :: s1 -> s2 withString :: s1 -> (s2 -> a) -> a withString x func = func (convertString x)

interactString :: s1 -> (s2 -> s2) -> s1 interactString x func = convertString (func (convertString x)

We can do something similar to fromIntegral so that we don't need a general StringConvertable class. -- | general coercion from integral types fromIntegral :: (Integral a, Num b) => a -> b fromIntegral = fromInteger . toInteger

...

which ought to make easy conversions (String to ByteString, for instance) easily doable for a library such as iconv. What do you think?

I think it needs some performance and code size experiments. Duncan

Devin Mullins

2:26 p.m.

On Thu, Feb 21, 2008 at 10:21:50AM +0000, Duncan Coutts wrote:

...

So I'm claiming that the single impl with boundary conversion gives us the best of both worlds, no code bloat due to specialisation and working with whichever string type you like, by converting it at the beginning and end. Of course only an experiment can say either way.

I think his point is that if I'm using three libraries, each of which uses a different String type, that's a lot of boundaries. Perhaps worse yet, if I'm a library author and I want to be a good citizen, I have to write three versions of my code (or create my own StringLike typeclass). I know of an example off-hand: http://nominolo.blogspot.com/2007/05/networkhttp-bytestrings.html (Of course, as I read that, I see that the lazy code is different from the strict code, but I'll just ignore that for the sake of, uh, argument.) (Sorry if this is a dupe; long thread, and I have to go to work.)

Thomas Schilling

4:51 p.m.

On 21 feb 2008, at 15.26, Devin Mullins wrote:

...

On Thu, Feb 21, 2008 at 10:21:50AM +0000, Duncan Coutts wrote:

...
So I'm claiming that the single impl with boundary conversion gives us the best of both worlds, no code bloat due to specialisation and working with whichever string type you like, by converting it at the beginning and end. Of course only an experiment can say either way.

I think his point is that if I'm using three libraries, each of which uses a different String type, that's a lot of boundaries. Perhaps worse yet, if I'm a library author and I want to be a good citizen, I have to write three versions of my code (or create my own StringLike typeclass). I know of an example off-hand: http://nominolo.blogspot.com/2007/05/networkhttp-bytestrings.html (Of course, as I read that, I see that the lazy code is different from the strict code, but I'll just ignore that for the sake of, uh, argument.)

Yes it does use different implementations, but the lazy interface has it's problems (leakage of handles, unclosed connections, and more). But what we really want is, as Duncan and Roman suggested, *one* standard, optimizable representation and conversions from and to it. This would work perfectly well with sockets. / Thomas

Johan Tibell

5:35 p.m.

On Thu, Feb 21, 2008 at 5:51 PM, Thomas Schilling wrote:

...

...
I know of an example off-hand: http://nominolo.blogspot.com/2007/05/networkhttp-bytestrings.html (Of course, as I read that, I see that the lazy code is different from the strict code, but I'll just ignore that for the sake of, uh, argument.)

Yes it does use different implementations, but the lazy interface has it's problems (leakage of handles, unclosed connections, and more). But what we really want is, as Duncan and Roman suggested, *one* standard, optimizable representation and conversions from and to it. This would work perfectly well with sockets.

I switched from lazy bytestrings to a left fold in my networking code after reading what Oleg wrote about streams vs folds. No problems with handles, etc. anymore. -- Johan

Thomas Schilling

5:58 p.m.

On 21 feb 2008, at 18.35, Johan Tibell wrote:

...

I switched from lazy bytestrings to a left fold in my networking code after reading what Oleg wrote about streams vs folds. No problems with handles, etc. anymore.

Do you fold over chunks? Can you continue to use Parsek or other utilities that need a stream-abstraction, and if so, how do you handle the end of a chunk. This is the kind of callback interface where lazy evaluation really abstracts things nicely.

Johan Tibell

10:35 p.m.

On Thu, Feb 21, 2008 at 6:58 PM, Thomas Schilling wrote:

...

On 21 feb 2008, at 18.35, Johan Tibell wrote:

...
I switched from lazy bytestrings to a left fold in my networking code after reading what Oleg wrote about streams vs folds. No problems with handles, etc. anymore.

Do you fold over chunks?

Yes.

...

Can you continue to use Parsek or other utilities that need a stream-abstraction, and if so, how do you handle the end of a chunk.

I don't think so. I'm writing an incremental bytestring parser.

...

This is the kind of callback interface where lazy evaluation really abstracts things nicely.

Streams are a nice abstraction however they don't work well in a server. -- Johan

Jules Bean

22 Feb 22 Feb

7:18 a.m.

Thomas Schilling wrote:

...

On 21 feb 2008, at 18.35, Johan Tibell wrote:

...
I switched from lazy bytestrings to a left fold in my networking code after reading what Oleg wrote about streams vs folds. No problems with handles, etc. anymore.

Do you fold over chunks? Can you continue to use Parsek or other utilities that need a stream-abstraction, and if so, how do you handle the end of a chunk. This is the kind of callback interface where lazy evaluation really abstracts things nicely.

You can't call a stream-abstraction utility using a left-fold-enumerator without cheating (unsafeInterleave), because the stream-abstraction is incompatible (and leaky! even though it is convenient). You can convert in the other direction fine. Chunk are no problem, and convertible: you can build an element fold from a chunk fold, and a chunk fold from an element fold (as long as there is an 'end-of-input' marker). Jules

Thomas Schilling

8:31 a.m.

On 22 feb 2008, at 08.18, Jules Bean wrote:

...

Thomas Schilling wrote:

...
On 21 feb 2008, at 18.35, Johan Tibell wrote:

...
I switched from lazy bytestrings to a left fold in my networking code after reading what Oleg wrote about streams vs folds. No problems with handles, etc. anymore.

Do you fold over chunks? Can you continue to use Parsek or other utilities that need a stream-abstraction, and if so, how do you handle the end of a chunk. This is the kind of callback interface where lazy evaluation really abstracts things nicely.

You can't call a stream-abstraction utility using a left-fold- enumerator without cheating (unsafeInterleave), because the stream- abstraction is incompatible (and leaky! even though it is convenient).

You can convert in the other direction fine.

Chunk are no problem, and convertible: you can build an element fold from a chunk fold, and a chunk fold from an element fold (as long as there is an 'end-of-input' marker).

Hm, thinking about it, parsers just need to be able to return a continuation instead of fail at the end of the input. This continuation can then be invoked with the next chunk as input.

Johan Tibell

8:59 a.m.

On Fri, Feb 22, 2008 at 9:31 AM, Thomas Schilling wrote:

...

On 22 feb 2008, at 08.18, Jules Bean wrote:

...
You can't call a stream-abstraction utility using a left-fold- enumerator without cheating (unsafeInterleave), because the stream- abstraction is incompatible (and leaky! even though it is convenient).

You can convert in the other direction fine.

Chunk are no problem, and convertible: you can build an element fold from a chunk fold, and a chunk fold from an element fold (as long as there is an 'end-of-input' marker).

Hm, thinking about it, parsers just need to be able to return a continuation instead of fail at the end of the input. This continuation can then be invoked with the next chunk as input.

This is what I'll do and it is also what binary-strict's [1] IncrementalGet parser does. 1. http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary-strict-0.3... -- Johan

Roman Leshchinskiy

21 Feb 21 Feb

2:42 a.m.

John Goerzen wrote:

...

I am concerned that the same thing is happening in Haskell. We know have three common list-like types: the regular list, strict ByteString, and lazy ByteString.

Why do you consider ByteString to be list-like but not arrays?

...

1) Does everyone agree with me that we have a problem here?

Yes, definitely. Haskell simply lacks a standard container library.

...

2) Would it make sense to make ListLike, or something like it, part of the Haskell core?

I don't think ListLike is the right approach. It's basically a fairly arbitrary collection of functions. It would be preferable, IMO, to identify a small set of combinators which would allow most list/sequence functions to be implemented generically and efficiently. Personally, I'd go with something like streams (the stream fusion ones) but I'm biased, of course.

...

3) Would it make sense to base as much code as possible in the Haskell core areound ListLike definitions? Here I think of functions such as lines and words, which make sense both on [Char] as well as ByteStrings.

Yes, as long as there are only very few core combinators. The more methods your ListLike class has, the harder it is to justify why a new function should be implemented in terms of those and not included in the class.

...

4) We are missing one final useful type: a Word32-based ByteString. When working in the Unicode character set, a 32-bit character can indeed be useful, and I could see situations in which the performance benefit of a ByteString-like implementation could be useful combared to [Char].

I have to disagree somewhat. What we are missing is a nice, efficient array library. There is nothing magical about ByteStrings, they are just unboxed arrays. In fact, there is no good reason for ByteString to be a separate type at all. Roman

Henning Thielemann

4:07 a.m.

On Thu, 21 Feb 2008, Roman Leshchinskiy wrote:

...

John Goerzen wrote:

...
2) Would it make sense to make ListLike, or something like it, part of the Haskell core?

I don't think ListLike is the right approach. It's basically a fairly arbitrary collection of functions. It would be preferable, IMO, to identify a small set of combinators which would allow most list/sequence functions to be implemented generically and efficiently. Personally, I'd go with something like streams (the stream fusion ones) but I'm biased, of course.

As long as it is only about speeding up list processing, one might also consider this as optimization problem. This could be handled without adapting much List based code in applications to a generic sequence class. That is, if I convert the result of a composition of list functions to a lazy ByteString, I tell the compiler that I don't need full laziness and the compiler can optimize, say ByteString.fromList . List.func1 . List.func2 . List.build to ByteString.func1 . ByteString.func2 . ByteString.build or even better ByteString.fusedFunc1Func2Build by some clever fusion framework. I think that a type class is easier to justify if it unifies data structures that are more different than just providing the same API with different efficiency.

Duncan Coutts

10:40 a.m.

On Thu, 2008-02-21 at 05:07 +0100, Henning Thielemann wrote:

...

As long as it is only about speeding up list processing, one might also consider this as optimization problem. This could be handled without adapting much List based code in applications to a generic sequence class. That is, if I convert the result of a composition of list functions to a lazy ByteString, I tell the compiler that I don't need full laziness and the compiler can optimize, say ByteString.fromList . List.func1 . List.func2 . List.build to ByteString.func1 . ByteString.func2 . ByteString.build or even better ByteString.fusedFunc1Func2Build by some clever fusion framework. I think that a type class is easier to justify if it unifies data structures that are more different than just providing the same API with different efficiency.

This is orthogonal I think. This could be done with stream fusion without any common class. It requires that the types use stream fusion, then conversions between types (eg list/array) could be done with streams too. Duncan

Henning Thielemann

12:34 p.m.

On Thu, 21 Feb 2008, Duncan Coutts wrote:

...

On Thu, 2008-02-21 at 05:07 +0100, Henning Thielemann wrote:

...
As long as it is only about speeding up list processing, one might also consider this as optimization problem. This could be handled without adapting much List based code in applications to a generic sequence class. That is, if I convert the result of a composition of list functions to a lazy ByteString, I tell the compiler that I don't need full laziness and the compiler can optimize, say ByteString.fromList . List.func1 . List.func2 . List.build to ByteString.func1 . ByteString.func2 . ByteString.build or even better ByteString.fusedFunc1Func2Build by some clever fusion framework. I think that a type class is easier to justify if it unifies data structures that are more different than just providing the same API with different efficiency.

This is orthogonal I think. This could be done with stream fusion without any common class. It requires that the types use stream fusion, then conversions between types (eg list/array) could be done with streams too.

I suppose we mean the same. My question is: Why do we use ByteString instead of [Word8] ? Entirely because of efficiency, right? So if we could stick to List code and only convert to ByteString at the end and the compiler all rewrites it to ByteString code, then we would not need libraries that are specialised to ByteString, but they can use [Word8] instead.

Duncan Coutts

11:34 p.m.

On Thu, 2008-02-21 at 13:34 +0100, Henning Thielemann wrote:

...

I suppose we mean the same. My question is: Why do we use ByteString instead of [Word8] ? Entirely because of efficiency, right? So if we could stick to List code and only convert to ByteString at the end and the compiler all rewrites it to ByteString code, then we would not need libraries that are specialised to ByteString, but they can use [Word8] instead.

Yeah if we could do that it'd be great. I've suggested similar things as extensions of our work on streams. If we know the list is being used fully strictly then we could have replaced it with a stricter data structure. Even if we could do that I'm not sure we'll ever get to the situation where it's fully automatic because some operations on array like things are slower than lists, like consing, so even if we discover that we're using our lists strictly it does not follow that we could get any benefit from converting to arrays. I think we'll be stuck with separate list and stricter array types for some time to come. Duncan

John Goerzen

4:35 a.m.

On Wednesday 20 February 2008 8:42:56 pm Roman Leshchinskiy wrote:

...

John Goerzen wrote:

...
I am concerned that the same thing is happening in Haskell. We know have three common list-like types: the regular list, strict ByteString, and lazy ByteString.

Why do you consider ByteString to be list-like but not arrays?

...
1) Does everyone agree with me that we have a problem here?

Yes, definitely. Haskell simply lacks a standard container library.

...
2) Would it make sense to make ListLike, or something like it, part of the Haskell core?

I don't think ListLike is the right approach. It's basically a fairly arbitrary collection of functions. It would be preferable, IMO, to identify a small set of combinators which would allow most list/sequence functions to be implemented generically and efficiently. Personally, I'd go with something like streams (the stream fusion ones) but I'm biased, of course.

From what I've heard of streams in this discussion, that does sound quite interesting. Unless streams are used internally for the [] implementation, though, we'd still need something to resolve the library compatibility question. -- John

Johan Tibell

9:06 a.m.

Hi John! On Wed, Feb 20, 2008 at 3:39 PM, John Goerzen wrote:

...

3) Would it make sense to base as much code as possible in the Haskell core areound ListLike definitions? Here I think of functions such as lines and words, which make sense both on [Char] as well as ByteStrings.

I don't think the examples you gave (i.e. lines and words) make much sense on ByteStrings. You would have to assume that the sequence of bytes are in some particular Unicode encoding and thus words and lines will break if they get passed a ByteString using a different encoding. I don't think either of those two functions make sense on anything but sequence of character types like String. -- Johan

Duncan Coutts

10:37 a.m.

On Thu, 2008-02-21 at 10:06 +0100, Johan Tibell wrote:

...

Hi John!

On Wed, Feb 20, 2008 at 3:39 PM, John Goerzen wrote:

...
3) Would it make sense to base as much code as possible in the Haskell core areound ListLike definitions? Here I think of functions such as lines and words, which make sense both on [Char] as well as ByteStrings.

I don't think the examples you gave (i.e. lines and words) make much sense on ByteStrings. You would have to assume that the sequence of bytes are in some particular Unicode encoding and thus words and lines will break if they get passed a ByteString using a different encoding. I don't think either of those two functions make sense on anything but sequence of character types like String.

That's exactly what the Data.ByteString[.Lazy].Char8 modules provide, a Char8 view of a Bytestring. Those modules provide functions like words, lines etc that assume an ASCII compatible 8bit encoding. One day we'll have a separate type that does Unicode with a similar fast packed representation. Duncan

Johan Tibell

12:37 p.m.

On Thu, Feb 21, 2008 at 11:37 AM, Duncan Coutts wrote:

...

On Thu, 2008-02-21 at 10:06 +0100, Johan Tibell wrote:

...
Hi John!

On Wed, Feb 20, 2008 at 3:39 PM, John Goerzen wrote:

...
3) Would it make sense to base as much code as possible in the Haskell core areound ListLike definitions? Here I think of functions such as lines and words, which make sense both on [Char] as well as ByteStrings.

I don't think the examples you gave (i.e. lines and words) make much sense on ByteStrings. You would have to assume that the sequence of bytes are in some particular Unicode encoding and thus words and lines will break if they get passed a ByteString using a different encoding. I don't think either of those two functions make sense on anything but sequence of character types like String.

That's exactly what the Data.ByteString[.Lazy].Char8 modules provide, a Char8 view of a Bytestring. Those modules provide functions like words, lines etc that assume an ASCII compatible 8bit encoding.

I would be very happy if people didn't use the .Char8 versions of ByteString except for being able to write byte literals using pack. (I would be even happier if Haskell had byte literals.) If people start using ByteString in their library interfaces instead of String I'll be really miserable because I can't really use their libraries for writing applications that need to be internationalized because their libraries would be limited to ASCII. Data.ByteString and Data.ByteString.Char8 uses the same ByteString type so I can take some bytes in UTF-32 which I read from the network and use Data.ByteString.Char8 functions on them which will fail. I prefer that a type that represent characters is guarded by encode and decode functions. If that's not the case it's easy to mix data in different encodings by mistake when e.g. writing web applications which involve data in several different encodings.

...

One day we'll have a separate type that does Unicode with a similar fast packed representation.

That will be a good day. :) -- Johan

Duncan Coutts

10:44 p.m.

On Thu, 2008-02-21 at 13:37 +0100, Johan Tibell wrote:

...

I would be very happy if people didn't use the .Char8 versions of ByteString except for being able to write byte literals using pack. (I would be even happier if Haskell had byte literals.) If people start using ByteString in their library interfaces instead of String I'll be really miserable because I can't really use their libraries for writing applications that need to be internationalized because their libraries would be limited to ASCII.

Data.ByteString and Data.ByteString.Char8 uses the same ByteString type so I can take some bytes in UTF-32 which I read from the network and use Data.ByteString.Char8 functions on them which will fail. I prefer that a type that represent characters is guarded by encode and decode functions. If that's not the case it's easy to mix data in different encodings by mistake when e.g. writing web applications which involve data in several different encodings.

The intention of allowing both views on one data type was to support the myriad of mixed ascii / binary protocols with a minimum of fuss (there are loads of network protocols like this). The intention was never to support Unicode like String does. That's why we called it Char8, not Char. I do accept that because the Unicode version has not appeared yet people have been tempted to use ByteString for text, which is not appropriate. Duncan

6340

Age (days ago)

6342

Last active (days ago)

List overview

Download

41 comments

16 participants

participants (16)

ajb＠spamcop.net
Antoine Latter
Ben Franksen
Chad Scherrer
David Menendez
David Roundy
Devin Mullins
Duncan Coutts
Henning Thielemann
Johan Tibell
John Goerzen
Jules Bean
Neil Mitchell
Roman Leshchinskiy
Ross Paterson
Thomas Schilling