Please add a method for optimized concat to the Semigroup class

Hi Edward, Thanks much for the very useful semigroups package. When using it in practice, it would be very useful to have an analogue to the mconcat method of Monoid. It has the obvious default implementation, but allows for an optimized implementation for specific instances. That turns out to be something that comes up all the time (at least for me) in real life. Thanks, Yitz

Does it have an obvious default implementation, bearing in mind it we
might really want a total function?
sconcat [] = error "Yikes - I wish this was total!"
sconcat [a] = a
sconcat (a:as) = a <> sconcat as
Best wishes
Stephen
On 3 May 2011 12:12, Yitzchak Gale
It has the obvious default implementation, but allows for an optimized implementation for specific instances. That turns out to be something that comes up all the time (at least for me) in real life.

Am 03.05.2011 um 13:39 schrieb Stephen Tetley:
Does it have an obvious default implementation, bearing in mind it we might really want a total function?
sconcat [] = error "Yikes - I wish this was total!" sconcat [a] = a sconcat (a:as) = a <> sconcat as
You have to provide the "neutral" element by yourself: infixl 4 <>> a <>> [] = a a <>> (b:bs) = a <> b <>> bs
Best wishes
Stephen
On 3 May 2011 12:12, Yitzchak Gale
wrote: [SNIP] It has the obvious default implementation, but allows for an optimized implementation for specific instances. That turns out to be something that comes up all the time (at least for me) in real life.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

There is that formulation, though usually I find I need to do it with
an alternative instead:
altconcat alt [] = alt
altconcat _ (a:as) = go a as
where
go acc [] = acc
go acc (b:bs) = go (acc <> b) bs
Both are "kind of, sort of" bringing you up to a Monoid though...
On 3 May 2011 12:56, Holger Siegel
You have to provide the "neutral" element by yourself:

Stephen Tetley wrote:
There is that formulation, though usually I find I need to do it with an alternative instead: altconcat alt [] = alt altconcat _ (a:as) = go a as where go acc [] = acc go acc (b:bs) = go (acc <> b) bs
But the whole reason we need this as a method is for the case that consecutive appends is inefficient.
Both are "kind of, sort of" bringing you up to a Monoid though...
altconcat and sconcatMaybe are doing that, because you need to decide what to do with an empty list when you define the instance. Holger's interface is not doing that, because the type does not require you to say anything about the case of an empty list in the instance. Another approach would be to depend on one of the packages that provides a non-empty list type, such as the NonEmptyList package. But I don't think this simple case justifies another dependency. You can wrap Holger's function in one of those types easily enough if you need to. Thanks, Yitz

On 3 May 2011 13:26, Yitzchak Gale
Both are "kind of, sort of" bringing you up to a Monoid though...
altconcat and sconcatMaybe are doing that, because you need to decide what to do with an empty list when you define the instance. Holger's interface is not doing that, because the type does not require you to say anything about the case of an empty list in the instance.
Holger's interface is bringing you "kind of, sort of" up to a Monoid but it allows neutral of whatever value you fancy at the time. At which point, either you're working at directly at a type - so you don't really need the "idea" of a semigroup just its pretty (<>) operator, or you do actually have a neutral and thus were working with a Monoid all along - again you just wanted the pretty (<>) operator. My real contention is that Semigroup doesn't have a proper concat operation[*], though notationally it is seductive - I do use both altconcat and Holger's version in my own code. [*] I could be persuaded otherwise, but I can't see how it would be an analogue mconcat in Monoid.

Stephen Tetley wrote:
Does it have an obvious default implementation, bearing in mind it we might really want a total function?
sconcat [] = error "Yikes - I wish this was total!" sconcat [a] = a sconcat (a:as) = a <> sconcat as
Holger Siegel wrote:
You have to provide the "neutral" element by yourself: a <>> [] = a a <>> (b:bs) = a <> b <>> bs
Yes, I think that would be the best interface. At first glance, one would be tempted to do something like returning a Maybe, as is often done in these kinds of cases. But here, the whole point of Semigroup is that we don't know what to do when the list is empty, so getting a Nothing result in that case is unhelpful. To illustrate the point, let's look at the conversion between those two approaches: sconcatNonempty x xs = fromJust . sconcat $ x : xs sconcatMaybe (x:xs) = Just $ sconcat x xs sconcatMaybe _ = Nothing I would much rather write sconcatMaybe when needed than to have to write unsafe code like sconcatNonempty. Presumably it's actually safe, since you would expect implementations to provide a result whenever the list is non-empty. But the type no longer provides that guarantee. Thanks, Yitz

On Tue, May 3, 2011 at 7:12 AM, Yitzchak Gale
Hi Edward,
Thanks much for the very useful semigroups package.
When using it in practice, it would be very useful to have an analogue to the mconcat method of Monoid. It has the obvious default implementation, but allows for an optimized implementation for specific instances. That turns out to be something that comes up all the time (at least for me) in real life.
Thanks, Yitz
I had considered an sconcat :: [a] -> a -> a with either the semantics you supplied or something like sconcat = appEndo . mconcat . map diff But it was somewhat unsatisfying, in part because of the need for a seed element. Another unsatisfying detail is no definition is in any way shape or form canonical when folding over a list. There are at least 3 definitions that make sense. The nice inductive Endo definition above (which differs in semantics from the one you proposed), something like what you propose, with its funny base case, and the option of placing something like the unit I placed in Endo on the other side. Finally, I wasn't able to get any such specialized sconcat to actually speed anything up. =/ I'm more than happy to revisit this decision, as it isn't particularly onerous to add an sconcat definition to Semigroup, but I've yet to see it pay off and it is somewhat disturbing to me that the type doesn't automatically offer up its meaning. As the Prelude has a general dearth of suitable container types that contain a guarantee of at least one element, my focus was instead upon the use of Foldable1 and Traversable1 from the semigroupoids package. This provides an optimization path, similar to those of Foldable and Traversable by optimizing the non-empty-by-construction *container* for its use of a semigroup, rather than optimizing the semigroup for its use of by one particularly inappropriate container. http://hackage.haskell.org/packages/archive/semigroupoids/1.1.2/doc/html/Dat... http://hackage.haskell.org/packages/archive/semigroupoids/1.1.2/doc/html/Dat... These offer up a wealth of combinators for manipulating Semigroups over suitable containers. Again, I'm more than happy to add it if only for symmetry with Data.Monoid, but I'd be much happier doing so with a compelling example where it actually sped things up, and if there was actually a better motivated inductive definition. -Edward

Edward Kmett wrote:
sconcat :: [a] -> a -> a with either the semantics you supplied or something like sconcat = appEndo . mconcat . map diff
The sconcat we have been discussing is sconcat = flip $ appEndo . getDual . mconcat . map (Dual . Endo . flip (<>)) (avoiding the use of Dual.diff.Dual so that we don't need to define dualUnDual or some such messiness)
But it was somewhat unsatisfying, in part because of the need for a seed element.
Only because, as you said, there is no standard non-empty list type.
Another unsatisfying detail is no definition is in any way shape or form canonical when folding over a list.
While our definition doesn't look any better than the others when expressed in terms of those combinators, it certainly seems to be the most natural when defined directly as Holger did. It's also the direct analogue of mconcat when viewed as the same type with lists replaced by non-empty lists. I'm sure that's the definition most users will expect. But I would be happy with whichever you supply.
...I'm more than happy to add it if only for symmetry with Data.Monoid, but I'd be much happier doing so with a compelling example where it actually sped things up
I'm currently doing some recognition algorithms on heterogeneous collections of graphical elements on a 2D canvas. Many types of elements have a location and a rectangular extent. You can often combine them, but there is no unit element because even an empty element needs to have a specific location. It would be very slow to combine a list of them incrementally; instead, you find the minimum and maximum X and Y coordinates, and combine the content using a fast algorithm. (I originally used Monoid instances by augmenting types with locationless empty elements. But that made a mess of my code and introduced a myriad of bugs and potential crashes. These are definitely semigroups, not monoids.) I'm sure there are countless other natural examples of semigroups in the wild, and that the typical non-trivial ones will benefit from an optimized sconcat. Thanks, Yitz

On Tue, May 3, 2011 at 3:43 PM, Yitzchak Gale
Edward Kmett wrote:
sconcat :: [a] -> a -> a with either the semantics you supplied or something like sconcat = appEndo . mconcat . map diff
The sconcat we have been discussing is
sconcat = flip $ appEndo . getDual . mconcat . map (Dual . Endo . flip (<>))
Holger's basically had this form, but I think Tetley's version is more useful, because it provides for the scenario you describe below where there is no value of the semigroup's type that you can merge with.
But it was somewhat unsatisfying, in part because of the need for a seed element.
Only because, as you said, there is no standard non-empty list type.
I have a streams package which provides a number of non-empty list types, but it is fairly high up my module hierarchy, as it requires a number of compiler extensions, and other classes, and so isn't available to the class down here in the semigroups package.
Another unsatisfying detail is no definition is in any way shape or form canonical when folding over a list.
While our definition doesn't look any better than the others when expressed in terms of those combinators, it certainly seems to be the most natural when defined directly as Holger did. It's also the direct analogue of mconcat when viewed as the same type with lists replaced by non-empty lists. I'm sure that's the definition most users will expect. But I would be happy with whichever you supply.
...I'm more than happy to add it if only for symmetry with Data.Monoid, but I'd be much happier doing so with a compelling example where it actually sped things up
I'm currently doing some recognition algorithms on heterogeneous collections of graphical elements on a 2D canvas. Many types of elements have a location and a rectangular extent. You can often combine them, but there is no unit element because even an empty element needs to have a specific location. It would be very slow to combine a list of them incrementally; instead, you find the minimum and maximum X and Y coordinates, and combine the content using a fast algorithm.
This is a pretty good example. Even if in this case it is mostly saving you the boxing and unboxing of the intermediate rectangles You still probably want something closer to Stephen Tetley's version, otherwise you're going to have to magic up just that kind of empty rectangle that you don't want to give though! In fact you probably want something even stronger, that way you can signal the empty list result 'out of band' of the values you can fit in the Semigroup. This would avoid specifying an alternative directly, and his case can be derived with sconcat :: Semigroup a => [a] -> Maybe a sconcat [] = Nothing sconcat (a:as) = Just (go a as) where go a (b:bs) = gs (a<>b) bs go a [] = a and effectively avoids fiddling with the empty case throughout the list. Then Stephen's version would look like tetley :: Semigroup a => a -> [a] -> a tetley alt = maybe alt id . sconcat Alternately Option could be used instead of Maybe to keep the package's API more self-contained, but I don't particularly care one way or the other. (I originally used Monoid instances by augmenting types with
locationless empty elements. But that made a mess of my code and introduced a myriad of bugs and potential crashes. These are definitely semigroups, not monoids.)
I'm sure there are countless other natural examples of semigroups in the wild, and that the typical non-trivial ones will benefit from an optimized sconcat.
Sold! (modulo the semantic considerations above) -Edward

Another option (upon reflection) would be to just transplant the NonEmpty
type from
http://hackage.haskell.org/packages/archive/streams/0.6.1.1/doc/html/Data-St...
data NonEmpty a = a :| [a]
http://hackage.haskell.org/packages/archive/streams/0.6.1.1/doc/html/Data-St...into
the semigroups package, which would give you the 'canonical non empty list'
you seem to want.
and then add the more pleasing and natural generalization
sconcat:: NonEmpty a -> a
to the Semigroup class
All I would need is to strip out the use of PatternGuards in a couple of
locations.
I would have to sprinkle a lot of instances through other packages on the
way up the package tree
NonEmpty isn't the most natural inductive version (Data.Stream.Future has
that distinction), but it does implement efficiently and it can cheaply
interconvert to [a].
-Edward
On Tue, May 3, 2011 at 6:49 PM, Edward Kmett
On Tue, May 3, 2011 at 3:43 PM, Yitzchak Gale
wrote: Edward Kmett wrote:
sconcat :: [a] -> a -> a with either the semantics you supplied or something like sconcat = appEndo . mconcat . map diff
The sconcat we have been discussing is
sconcat = flip $ appEndo . getDual . mconcat . map (Dual . Endo . flip (<>))
Holger's basically had this form, but I think Tetley's version is more useful, because it provides for the scenario you describe below where there is no value of the semigroup's type that you can merge with.
But it was somewhat unsatisfying, in part because of the need for a seed element.
Only because, as you said, there is no standard non-empty list type.
I have a streams package which provides a number of non-empty list types, but it is fairly high up my module hierarchy, as it requires a number of compiler extensions, and other classes, and so isn't available to the class down here in the semigroups package.
Another unsatisfying detail is no definition is in any way shape or form canonical when folding over a list.
While our definition doesn't look any better than the others when expressed in terms of those combinators, it certainly seems to be the most natural when defined directly as Holger did. It's also the direct analogue of mconcat when viewed as the same type with lists replaced by non-empty lists. I'm sure that's the definition most users will expect. But I would be happy with whichever you supply.
...I'm more than happy to add it if only for symmetry with Data.Monoid, but I'd be much happier doing so with a compelling example where it actually sped things up
I'm currently doing some recognition algorithms on heterogeneous collections of graphical elements on a 2D canvas. Many types of elements have a location and a rectangular extent. You can often combine them, but there is no unit element because even an empty element needs to have a specific location. It would be very slow to combine a list of them incrementally; instead, you find the minimum and maximum X and Y coordinates, and combine the content using a fast algorithm.
This is a pretty good example. Even if in this case it is mostly saving you the boxing and unboxing of the intermediate rectangles
You still probably want something closer to Stephen Tetley's version, otherwise you're going to have to magic up just that kind of empty rectangle that you don't want to give though!
In fact you probably want something even stronger, that way you can signal the empty list result 'out of band' of the values you can fit in the Semigroup. This would avoid specifying an alternative directly, and his case can be derived with
sconcat :: Semigroup a => [a] -> Maybe a sconcat [] = Nothing sconcat (a:as) = Just (go a as) where go a (b:bs) = gs (a<>b) bs go a [] = a
and effectively avoids fiddling with the empty case throughout the list.
Then Stephen's version would look like
tetley :: Semigroup a => a -> [a] -> a tetley alt = maybe alt id . sconcat
Alternately Option could be used instead of Maybe to keep the package's API more self-contained, but I don't particularly care one way or the other.
(I originally used Monoid instances by augmenting types with
locationless empty elements. But that made a mess of my code and introduced a myriad of bugs and potential crashes. These are definitely semigroups, not monoids.)
I'm sure there are countless other natural examples of semigroups in the wild, and that the typical non-trivial ones will benefit from an optimized sconcat.
Sold! (modulo the semantic considerations above)
-Edward

I felt I should probably mention that ultimately what was done is I moved NonEmpty all the way down into semigroups and chose
sconcat :: NonEmpty a -> a
at it was the closest analogue to the current mconcat behavior.
So, request accomodated. ;)
-Edward
On Tue, May 3, 2011 at 7:23 PM, Edward Kmett
Another option (upon reflection) would be to just transplant the NonEmpty type from
http://hackage.haskell.org/packages/archive/streams/0.6.1.1/doc/html/Data-St...
data NonEmpty a = a :| [a]
http://hackage.haskell.org/packages/archive/streams/0.6.1.1/doc/html/Data-St...into the semigroups package, which would give you the 'canonical non empty list' you seem to want.
and then add the more pleasing and natural generalization
sconcat:: NonEmpty a -> a
to the Semigroup class
All I would need is to strip out the use of PatternGuards in a couple of locations.
I would have to sprinkle a lot of instances through other packages on the way up the package tree
NonEmpty isn't the most natural inductive version (Data.Stream.Future has that distinction), but it does implement efficiently and it can cheaply interconvert to [a].
-Edward
On Tue, May 3, 2011 at 6:49 PM, Edward Kmett
wrote: On Tue, May 3, 2011 at 3:43 PM, Yitzchak Gale
wrote: Edward Kmett wrote:
sconcat :: [a] -> a -> a with either the semantics you supplied or something like sconcat = appEndo . mconcat . map diff
The sconcat we have been discussing is
sconcat = flip $ appEndo . getDual . mconcat . map (Dual . Endo . flip (<>))
Holger's basically had this form, but I think Tetley's version is more useful, because it provides for the scenario you describe below where there is no value of the semigroup's type that you can merge with.
But it was somewhat unsatisfying, in part because of the need for a seed element.
Only because, as you said, there is no standard non-empty list type.
I have a streams package which provides a number of non-empty list types, but it is fairly high up my module hierarchy, as it requires a number of compiler extensions, and other classes, and so isn't available to the class down here in the semigroups package.
Another unsatisfying detail is no definition is in any way shape or form canonical when folding over a list.
While our definition doesn't look any better than the others when expressed in terms of those combinators, it certainly seems to be the most natural when defined directly as Holger did. It's also the direct analogue of mconcat when viewed as the same type with lists replaced by non-empty lists. I'm sure that's the definition most users will expect. But I would be happy with whichever you supply.
...I'm more than happy to add it if only for symmetry with Data.Monoid, but I'd be much happier doing so with a compelling example where it actually sped things up
I'm currently doing some recognition algorithms on heterogeneous collections of graphical elements on a 2D canvas. Many types of elements have a location and a rectangular extent. You can often combine them, but there is no unit element because even an empty element needs to have a specific location. It would be very slow to combine a list of them incrementally; instead, you find the minimum and maximum X and Y coordinates, and combine the content using a fast algorithm.
This is a pretty good example. Even if in this case it is mostly saving you the boxing and unboxing of the intermediate rectangles
You still probably want something closer to Stephen Tetley's version, otherwise you're going to have to magic up just that kind of empty rectangle that you don't want to give though!
In fact you probably want something even stronger, that way you can signal the empty list result 'out of band' of the values you can fit in the Semigroup. This would avoid specifying an alternative directly, and his case can be derived with
sconcat :: Semigroup a => [a] -> Maybe a sconcat [] = Nothing sconcat (a:as) = Just (go a as) where go a (b:bs) = gs (a<>b) bs go a [] = a
and effectively avoids fiddling with the empty case throughout the list.
Then Stephen's version would look like
tetley :: Semigroup a => a -> [a] -> a tetley alt = maybe alt id . sconcat
Alternately Option could be used instead of Maybe to keep the package's API more self-contained, but I don't particularly care one way or the other.
(I originally used Monoid instances by augmenting types with
locationless empty elements. But that made a mess of my code and introduced a myriad of bugs and potential crashes. These are definitely semigroups, not monoids.)
I'm sure there are countless other natural examples of semigroups in the wild, and that the typical non-trivial ones will benefit from an optimized sconcat.
Sold! (modulo the semantic considerations above)
-Edward

Edward Kmett wrote:
I felt I should probably mention that ultimately what was done is I moved NonEmpty all the way down into semigroups and chose
sconcat :: NonEmpty a -> a at it was the closest analogue to the current mconcat behavior. So, request accomodated. ;)
Indeed, this is an excellent solution. Thanks! -Yitz

Yitzchak Gale schrieb:
When using it in practice, it would be very useful to have an analogue to the mconcat method of Monoid. It has the obvious default implementation, but allows for an optimized implementation for specific instances. That turns out to be something that comes up all the time (at least for me) in real life.
Btw. has someone an idea how to design 'mconcat' for pair types? I mean, it is certainly sensible to define: instance (Monoid a, Monoid b) => Monoid (a,b) where mappend (a0,b0) (a1,b1) = (mappend a0 a1, mappend b0 b1) mconcat pairs = (mconcat (map fst pairs), mconcat (map snd pairs)) but the mconcat definition would be inefficient for the type (Sum a, Sum b). E.g. if I evaluate the first sum in (mconcat pairs) first and the second sum last, then 'pairs' must be stored until we evaluate the second sum. Without the Monoid instance I would certainly compute both sums simultaneously in one left fold. Since a left fold can be expressed in terms of a right fold, it would certainly work to map left-fold-types like Sum to an Endo type, but this leads us to functional dependent types or type functions.
participants (5)
-
Edward Kmett
-
Henning Thielemann
-
Holger Siegel
-
Stephen Tetley
-
Yitzchak Gale