Please add a method for optimized concat to the Semigroup class

newer
What's the advantage of writing...

Yitzchak Gale

3 May 2011 3 May '11

11:12 a.m.

Hi Edward, Thanks much for the very useful semigroups package. When using it in practice, it would be very useful to have an analogue to the mconcat method of Monoid. It has the obvious default implementation, but allows for an optimized implementation for specific instances. That turns out to be something that comes up all the time (at least for me) in real life. Thanks, Yitz

Show replies by date

Stephen Tetley

3 May 3 May

11:39 a.m.

New subject: Please add a method for optimized concat to the Semigroup class

Does it have an obvious default implementation, bearing in mind it we might really want a total function? sconcat [] = error "Yikes - I wish this was total!" sconcat [a] = a sconcat (a:as) = a <> sconcat as Best wishes Stephen On 3 May 2011 12:12, Yitzchak Gale wrote: [SNIP]

...

It has the obvious default implementation, but allows for an optimized implementation for specific instances. That turns out to be something that comes up all the time (at least for me) in real life.

Holger Siegel

11:56 a.m.

Am 03.05.2011 um 13:39 schrieb Stephen Tetley:

...

Does it have an obvious default implementation, bearing in mind it we might really want a total function?

sconcat [] = error "Yikes - I wish this was total!" sconcat [a] = a sconcat (a:as) = a <> sconcat as

You have to provide the "neutral" element by yourself: infixl 4 <>> a <>> [] = a a <>> (b:bs) = a <> b <>> bs

...

Best wishes

Stephen

On 3 May 2011 12:12, Yitzchak Gale wrote: [SNIP]

...
It has the obvious default implementation, but allows for an optimized implementation for specific instances. That turns out to be something that comes up all the time (at least for me) in real life.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Stephen Tetley

12:03 p.m.

New subject: Please add a method for optimized concat to the Semigroup class

There is that formulation, though usually I find I need to do it with an alternative instead: altconcat alt [] = alt altconcat _ (a:as) = go a as where go acc [] = acc go acc (b:bs) = go (acc <> b) bs Both are "kind of, sort of" bringing you up to a Monoid though... On 3 May 2011 12:56, Holger Siegel wrote:

...

You have to provide the "neutral" element by yourself:

Yitzchak Gale

12:26 p.m.

New subject: Please add a method for optimized concat to the Semigroup class

Stephen Tetley wrote:

...

There is that formulation, though usually I find I need to do it with an alternative instead: altconcat alt [] = alt altconcat _ (a:as) = go a as where go acc [] = acc go acc (b:bs) = go (acc <> b) bs

But the whole reason we need this as a method is for the case that consecutive appends is inefficient.

...

Both are "kind of, sort of" bringing you up to a Monoid though...

altconcat and sconcatMaybe are doing that, because you need to decide what to do with an empty list when you define the instance. Holger's interface is not doing that, because the type does not require you to say anything about the case of an empty list in the instance. Another approach would be to depend on one of the packages that provides a non-empty list type, such as the NonEmptyList package. But I don't think this simple case justifies another dependency. You can wrap Holger's function in one of those types easily enough if you need to. Thanks, Yitz

Stephen Tetley

3:24 p.m.

New subject: Please add a method for optimized concat to the Semigroup class

On 3 May 2011 13:26, Yitzchak Gale wrote:

...

...
Both are "kind of, sort of" bringing you up to a Monoid though...

altconcat and sconcatMaybe are doing that, because you need to decide what to do with an empty list when you define the instance. Holger's interface is not doing that, because the type does not require you to say anything about the case of an empty list in the instance.

Holger's interface is bringing you "kind of, sort of" up to a Monoid but it allows neutral of whatever value you fancy at the time. At which point, either you're working at directly at a type - so you don't really need the "idea" of a semigroup just its pretty (<>) operator, or you do actually have a neutral and thus were working with a Monoid all along - again you just wanted the pretty (<>) operator. My real contention is that Semigroup doesn't have a proper concat operation[*], though notationally it is seductive - I do use both altconcat and Holger's version in my own code. [*] I could be persuaded otherwise, but I can't see how it would be an analogue mconcat in Monoid.

Yitzchak Gale

12:16 p.m.

New subject: Please add a method for optimized concat to the Semigroup class

Stephen Tetley wrote:

...

...
Does it have an obvious default implementation, bearing in mind it we might really want a total function?

sconcat [] = error "Yikes - I wish this was total!" sconcat [a] = a sconcat (a:as) = a <> sconcat as

Holger Siegel wrote:

...

You have to provide the "neutral" element by yourself: a <>> [] = a a <>> (b:bs) = a <> b <>> bs

Yes, I think that would be the best interface. At first glance, one would be tempted to do something like returning a Maybe, as is often done in these kinds of cases. But here, the whole point of Semigroup is that we don't know what to do when the list is empty, so getting a Nothing result in that case is unhelpful. To illustrate the point, let's look at the conversion between those two approaches: sconcatNonempty x xs = fromJust . sconcat $ x : xs sconcatMaybe (x:xs) = Just $ sconcat x xs sconcatMaybe _ = Nothing I would much rather write sconcatMaybe when needed than to have to write unsafe code like sconcatNonempty. Presumably it's actually safe, since you would expect implementations to provide a result whenever the list is non-empty. But the type no longer provides that guarantee. Thanks, Yitz

Edward Kmett

4:44 p.m.

On Tue, May 3, 2011 at 7:12 AM, Yitzchak Gale wrote:

...

Hi Edward,

Thanks much for the very useful semigroups package.

When using it in practice, it would be very useful to have an analogue to the mconcat method of Monoid. It has the obvious default implementation, but allows for an optimized implementation for specific instances. That turns out to be something that comes up all the time (at least for me) in real life.

Thanks, Yitz

I had considered an sconcat :: [a] -> a -> a with either the semantics you supplied or something like sconcat = appEndo . mconcat . map diff But it was somewhat unsatisfying, in part because of the need for a seed element. Another unsatisfying detail is no definition is in any way shape or form canonical when folding over a list. There are at least 3 definitions that make sense. The nice inductive Endo definition above (which differs in semantics from the one you proposed), something like what you propose, with its funny base case, and the option of placing something like the unit I placed in Endo on the other side. Finally, I wasn't able to get any such specialized sconcat to actually speed anything up. =/ I'm more than happy to revisit this decision, as it isn't particularly onerous to add an sconcat definition to Semigroup, but I've yet to see it pay off and it is somewhat disturbing to me that the type doesn't automatically offer up its meaning. As the Prelude has a general dearth of suitable container types that contain a guarantee of at least one element, my focus was instead upon the use of Foldable1 and Traversable1 from the semigroupoids package. This provides an optimization path, similar to those of Foldable and Traversable by optimizing the non-empty-by-construction *container* for its use of a semigroup, rather than optimizing the semigroup for its use of by one particularly inappropriate container. http://hackage.haskell.org/packages/archive/semigroupoids/1.1.2/doc/html/Dat... http://hackage.haskell.org/packages/archive/semigroupoids/1.1.2/doc/html/Dat... These offer up a wealth of combinators for manipulating Semigroups over suitable containers. Again, I'm more than happy to add it if only for symmetry with Data.Monoid, but I'd be much happier doing so with a compelling example where it actually sped things up, and if there was actually a better motivated inductive definition. -Edward

Yitzchak Gale

7:43 p.m.

Edward Kmett wrote:

...

sconcat :: [a] -> a -> a with either the semantics you supplied or something like sconcat = appEndo . mconcat . map diff

The sconcat we have been discussing is sconcat = flip $ appEndo . getDual . mconcat . map (Dual . Endo . flip (<>)) (avoiding the use of Dual.diff.Dual so that we don't need to define dualUnDual or some such messiness)

...

But it was somewhat unsatisfying, in part because of the need for a seed element.

Only because, as you said, there is no standard non-empty list type.

...

Another unsatisfying detail is no definition is in any way shape or form canonical when folding over a list.

While our definition doesn't look any better than the others when expressed in terms of those combinators, it certainly seems to be the most natural when defined directly as Holger did. It's also the direct analogue of mconcat when viewed as the same type with lists replaced by non-empty lists. I'm sure that's the definition most users will expect. But I would be happy with whichever you supply.

...

...I'm more than happy to add it if only for symmetry with Data.Monoid, but I'd be much happier doing so with a compelling example where it actually sped things up

I'm currently doing some recognition algorithms on heterogeneous collections of graphical elements on a 2D canvas. Many types of elements have a location and a rectangular extent. You can often combine them, but there is no unit element because even an empty element needs to have a specific location. It would be very slow to combine a list of them incrementally; instead, you find the minimum and maximum X and Y coordinates, and combine the content using a fast algorithm. (I originally used Monoid instances by augmenting types with locationless empty elements. But that made a mess of my code and introduced a myriad of bugs and potential crashes. These are definitely semigroups, not monoids.) I'm sure there are countless other natural examples of semigroups in the wild, and that the typical non-trivial ones will benefit from an optimized sconcat. Thanks, Yitz

Edward Kmett

10:49 p.m.

On Tue, May 3, 2011 at 3:43 PM, Yitzchak Gale wrote:

...

Edward Kmett wrote:

...
sconcat :: [a] -> a -> a with either the semantics you supplied or something like sconcat = appEndo . mconcat . map diff

...

The sconcat we have been discussing is

sconcat = flip $ appEndo . getDual . mconcat . map (Dual . Endo . flip (<>))

Holger's basically had this form, but I think Tetley's version is more useful, because it provides for the scenario you describe below where there is no value of the semigroup's type that you can merge with.

...

...
But it was somewhat unsatisfying, in part because of the need for a seed element.

Only because, as you said, there is no standard non-empty list type.

I have a streams package which provides a number of non-empty list types, but it is fairly high up my module hierarchy, as it requires a number of compiler extensions, and other classes, and so isn't available to the class down here in the semigroups package.

...

...
Another unsatisfying detail is no definition is in any way shape or form canonical when folding over a list.

While our definition doesn't look any better than the others when expressed in terms of those combinators, it certainly seems to be the most natural when defined directly as Holger did. It's also the direct analogue of mconcat when viewed as the same type with lists replaced by non-empty lists. I'm sure that's the definition most users will expect. But I would be happy with whichever you supply.

...
...I'm more than happy to add it if only for symmetry with Data.Monoid, but I'd be much happier doing so with a compelling example where it actually sped things up

I'm currently doing some recognition algorithms on heterogeneous collections of graphical elements on a 2D canvas. Many types of elements have a location and a rectangular extent. You can often combine them, but there is no unit element because even an empty element needs to have a specific location. It would be very slow to combine a list of them incrementally; instead, you find the minimum and maximum X and Y coordinates, and combine the content using a fast algorithm.

This is a pretty good example. Even if in this case it is mostly saving you the boxing and unboxing of the intermediate rectangles You still probably want something closer to Stephen Tetley's version, otherwise you're going to have to magic up just that kind of empty rectangle that you don't want to give though! In fact you probably want something even stronger, that way you can signal the empty list result 'out of band' of the values you can fit in the Semigroup. This would avoid specifying an alternative directly, and his case can be derived with sconcat :: Semigroup a => [a] -> Maybe a sconcat [] = Nothing sconcat (a:as) = Just (go a as) where go a (b:bs) = gs (a<>b) bs go a [] = a and effectively avoids fiddling with the empty case throughout the list. Then Stephen's version would look like tetley :: Semigroup a => a -> [a] -> a tetley alt = maybe alt id . sconcat Alternately Option could be used instead of Maybe to keep the package's API more self-contained, but I don't particularly care one way or the other. (I originally used Monoid instances by augmenting types with

...

locationless empty elements. But that made a mess of my code and introduced a myriad of bugs and potential crashes. These are definitely semigroups, not monoids.)

...

I'm sure there are countless other natural examples of semigroups in the wild, and that the typical non-trivial ones will benefit from an optimized sconcat.

Sold! (modulo the semantic considerations above) -Edward

Edward Kmett

11:23 p.m.

Another option (upon reflection) would be to just transplant the NonEmpty type from http://hackage.haskell.org/packages/archive/streams/0.6.1.1/doc/html/Data-St... data NonEmpty a = a :| [a] http://hackage.haskell.org/packages/archive/streams/0.6.1.1/doc/html/Data-St...into the semigroups package, which would give you the 'canonical non empty list' you seem to want. and then add the more pleasing and natural generalization sconcat:: NonEmpty a -> a to the Semigroup class All I would need is to strip out the use of PatternGuards in a couple of locations. I would have to sprinkle a lot of instances through other packages on the way up the package tree NonEmpty isn't the most natural inductive version (Data.Stream.Future has that distinction), but it does implement efficiently and it can cheaply interconvert to [a]. -Edward On Tue, May 3, 2011 at 6:49 PM, Edward Kmett wrote:

...

On Tue, May 3, 2011 at 3:43 PM, Yitzchak Gale wrote:

...
Edward Kmett wrote:

...
sconcat :: [a] -> a -> a with either the semantics you supplied or something like sconcat = appEndo . mconcat . map diff

...
The sconcat we have been discussing is

sconcat = flip $ appEndo . getDual . mconcat . map (Dual . Endo . flip (<>))

Holger's basically had this form, but I think Tetley's version is more useful, because it provides for the scenario you describe below where there is no value of the semigroup's type that you can merge with.

...
...
But it was somewhat unsatisfying, in part because of the need for a seed element.

Only because, as you said, there is no standard non-empty list type.

I have a streams package which provides a number of non-empty list types, but it is fairly high up my module hierarchy, as it requires a number of compiler extensions, and other classes, and so isn't available to the class down here in the semigroups package.

...
...
Another unsatisfying detail is no definition is in any way shape or form canonical when folding over a list.

While our definition doesn't look any better than the others when expressed in terms of those combinators, it certainly seems to be the most natural when defined directly as Holger did. It's also the direct analogue of mconcat when viewed as the same type with lists replaced by non-empty lists. I'm sure that's the definition most users will expect. But I would be happy with whichever you supply.

...
...I'm more than happy to add it if only for symmetry with Data.Monoid, but I'd be much happier doing so with a compelling example where it actually sped things up

I'm currently doing some recognition algorithms on heterogeneous collections of graphical elements on a 2D canvas. Many types of elements have a location and a rectangular extent. You can often combine them, but there is no unit element because even an empty element needs to have a specific location. It would be very slow to combine a list of them incrementally; instead, you find the minimum and maximum X and Y coordinates, and combine the content using a fast algorithm.

This is a pretty good example. Even if in this case it is mostly saving you the boxing and unboxing of the intermediate rectangles

You still probably want something closer to Stephen Tetley's version, otherwise you're going to have to magic up just that kind of empty rectangle that you don't want to give though!

In fact you probably want something even stronger, that way you can signal the empty list result 'out of band' of the values you can fit in the Semigroup. This would avoid specifying an alternative directly, and his case can be derived with

sconcat :: Semigroup a => [a] -> Maybe a sconcat [] = Nothing sconcat (a:as) = Just (go a as) where go a (b:bs) = gs (a<>b) bs go a [] = a

and effectively avoids fiddling with the empty case throughout the list.

Then Stephen's version would look like

tetley :: Semigroup a => a -> [a] -> a tetley alt = maybe alt id . sconcat

Alternately Option could be used instead of Maybe to keep the package's API more self-contained, but I don't particularly care one way or the other.

(I originally used Monoid instances by augmenting types with

...
locationless empty elements. But that made a mess of my code and introduced a myriad of bugs and potential crashes. These are definitely semigroups, not monoids.)

...
I'm sure there are countless other natural examples of semigroups in the wild, and that the typical non-trivial ones will benefit from an optimized sconcat.

Sold! (modulo the semantic considerations above)

-Edward

Edward Kmett

31 May 31 May

1:14 a.m.

I felt I should probably mention that ultimately what was done is I moved NonEmpty all the way down into semigroups and chose

...

sconcat :: NonEmpty a -> a

at it was the closest analogue to the current mconcat behavior. So, request accomodated. ;) -Edward On Tue, May 3, 2011 at 7:23 PM, Edward Kmett wrote:

...

Another option (upon reflection) would be to just transplant the NonEmpty type from

http://hackage.haskell.org/packages/archive/streams/0.6.1.1/doc/html/Data-St...

data NonEmpty a = a :| [a]

http://hackage.haskell.org/packages/archive/streams/0.6.1.1/doc/html/Data-St...into the semigroups package, which would give you the 'canonical non empty list' you seem to want.

and then add the more pleasing and natural generalization

sconcat:: NonEmpty a -> a

to the Semigroup class

All I would need is to strip out the use of PatternGuards in a couple of locations.

I would have to sprinkle a lot of instances through other packages on the way up the package tree

NonEmpty isn't the most natural inductive version (Data.Stream.Future has that distinction), but it does implement efficiently and it can cheaply interconvert to [a].

-Edward

On Tue, May 3, 2011 at 6:49 PM, Edward Kmett wrote:

...
On Tue, May 3, 2011 at 3:43 PM, Yitzchak Gale wrote:

...
Edward Kmett wrote:

...
sconcat :: [a] -> a -> a with either the semantics you supplied or something like sconcat = appEndo . mconcat . map diff

...
The sconcat we have been discussing is

sconcat = flip $ appEndo . getDual . mconcat . map (Dual . Endo . flip (<>))

Holger's basically had this form, but I think Tetley's version is more useful, because it provides for the scenario you describe below where there is no value of the semigroup's type that you can merge with.

...
...
But it was somewhat unsatisfying, in part because of the need for a seed element.

Only because, as you said, there is no standard non-empty list type.

I have a streams package which provides a number of non-empty list types, but it is fairly high up my module hierarchy, as it requires a number of compiler extensions, and other classes, and so isn't available to the class down here in the semigroups package.

...
...
Another unsatisfying detail is no definition is in any way shape or form canonical when folding over a list.

While our definition doesn't look any better than the others when expressed in terms of those combinators, it certainly seems to be the most natural when defined directly as Holger did. It's also the direct analogue of mconcat when viewed as the same type with lists replaced by non-empty lists. I'm sure that's the definition most users will expect. But I would be happy with whichever you supply.

...
...I'm more than happy to add it if only for symmetry with Data.Monoid, but I'd be much happier doing so with a compelling example where it actually sped things up

I'm currently doing some recognition algorithms on heterogeneous collections of graphical elements on a 2D canvas. Many types of elements have a location and a rectangular extent. You can often combine them, but there is no unit element because even an empty element needs to have a specific location. It would be very slow to combine a list of them incrementally; instead, you find the minimum and maximum X and Y coordinates, and combine the content using a fast algorithm.

This is a pretty good example. Even if in this case it is mostly saving you the boxing and unboxing of the intermediate rectangles

You still probably want something closer to Stephen Tetley's version, otherwise you're going to have to magic up just that kind of empty rectangle that you don't want to give though!

In fact you probably want something even stronger, that way you can signal the empty list result 'out of band' of the values you can fit in the Semigroup. This would avoid specifying an alternative directly, and his case can be derived with

sconcat :: Semigroup a => [a] -> Maybe a sconcat [] = Nothing sconcat (a:as) = Just (go a as) where go a (b:bs) = gs (a<>b) bs go a [] = a

and effectively avoids fiddling with the empty case throughout the list.

Then Stephen's version would look like

tetley :: Semigroup a => a -> [a] -> a tetley alt = maybe alt id . sconcat

Alternately Option could be used instead of Maybe to keep the package's API more self-contained, but I don't particularly care one way or the other.

(I originally used Monoid instances by augmenting types with

...
locationless empty elements. But that made a mess of my code and introduced a myriad of bugs and potential crashes. These are definitely semigroups, not monoids.)

...
I'm sure there are countless other natural examples of semigroups in the wild, and that the typical non-trivial ones will benefit from an optimized sconcat.

Sold! (modulo the semantic considerations above)

-Edward

Yitzchak Gale

7:47 a.m.

Edward Kmett wrote:

...

I felt I should probably mention that ultimately what was done is I moved NonEmpty all the way down into semigroups and chose

...
sconcat :: NonEmpty a -> a at it was the closest analogue to the current mconcat behavior. So, request accomodated. ;)

Indeed, this is an excellent solution. Thanks! -Yitz

Henning Thielemann

14 May 14 May

7:38 p.m.

New subject: Please add a method for optimized concat to the Semigroup class

Yitzchak Gale schrieb:

...

When using it in practice, it would be very useful to have an analogue to the mconcat method of Monoid. It has the obvious default implementation, but allows for an optimized implementation for specific instances. That turns out to be something that comes up all the time (at least for me) in real life.

Btw. has someone an idea how to design 'mconcat' for pair types? I mean, it is certainly sensible to define: instance (Monoid a, Monoid b) => Monoid (a,b) where mappend (a0,b0) (a1,b1) = (mappend a0 a1, mappend b0 b1) mconcat pairs = (mconcat (map fst pairs), mconcat (map snd pairs)) but the mconcat definition would be inefficient for the type (Sum a, Sum b). E.g. if I evaluate the first sum in (mconcat pairs) first and the second sum last, then 'pairs' must be stored until we evaluate the second sum. Without the Monoid instance I would certainly compute both sums simultaneously in one left fold. Since a left fold can be expressed in terms of a right fold, it would certainly work to map left-fold-types like Sum to an Endo type, but this leads us to functional dependent types or type functions.

5150

Age (days ago)

5178

Last active (days ago)

List overview

Download

13 comments

5 participants

participants (5)

Edward Kmett
Henning Thielemann
Holger Siegel
Stephen Tetley
Yitzchak Gale

Please add a method for optimized concat to the Semigroup class

tags

participants (5)