A practical Haskell puzzle

You have written a large software system in Haskell. Wishing to play to Haskell's strength, you have structured your system as a series of composable layers. So you have data types Layer1, Layer2, ... and functions layer2 :: Layer1 -> Layer2 layer3 :: Layer2 -> Layer3 ... etc. Of course you'll want to be able to run any range of the layers separately, at least for testing and debugging purposes if not for the functionality itself. So your UI module (command line or whatever) that launches your application provides a data type data Layers = Layers Int Int that indicates which layers to run, and functions deserialize1 :: L.ByteString -> Layer1 deserialize2 :: L.ByteString -> Layer2 ... serialize1 :: Layer1 -> L.ByteString serialize2 :: Layer2 -> L.ByteString ... etc. Now you need a function runLayers :: Layers -> L.ByteString -> L.ByteString so that the effect is for example runLayers (Layers 4 6) = serialize6 . layer6 . layer5 . deserialize4 Typically there could be 20 or 30 layers, so writing out each case could result in hundred of boilerplate definitions for runLayers. Scripting the generation of all that boilerplate, e.g. using TH, doesn't seem very attractive either. On the other hand, it seems silly to have to use super-advanced type olegery for such a trivially simple yet centrally important component of the system. What is the best way to write runLayers? Feel free to change the details of the above design, as long as it meets the functionality requirements expressed. Regards, Yitz

Yitzchak Gale wrote:
You have written a large software system in Haskell. Wishing to play to Haskell's strength, you have structured your system as a series of composable layers. So you have data types
Layer1, Layer2, ...
and functions
layer2 :: Layer1 -> Layer2 layer3 :: Layer2 -> Layer3 ....
etc.
Of course you'll want to be able to run any range of the layers separately, at least for testing and debugging purposes if not for the functionality itself.
So your UI module (command line or whatever) that launches your application provides a data type
data Layers = Layers Int Int
that indicates which layers to run, and functions
deserialize1 :: L.ByteString -> Layer1 deserialize2 :: L.ByteString -> Layer2 ....
serialize1 :: Layer1 -> L.ByteString serialize2 :: Layer2 -> L.ByteString ....
etc.
Now you need a function
runLayers :: Layers -> L.ByteString -> L.ByteString
so that the effect is for example
runLayers (Layers 4 6) = serialize6 . layer6 . layer5 . deserialize4
[..]
What is the best way to write runLayers? Feel free to change the details of the above design, as long as it meets the functionality requirements expressed.
Solution: compose all the functions, but do not use the standard function composition (.) to do that. Instead, make a new data type with composition as constructor. This way, you can inspect the composition afterwards and run only parts of it. Solution, put differently: Make a type-safe list of the whole chain of functions. Then, the runLayers function throws away everything outside the range and composes what is left. Here a rough sketch of what I have in mind: data Compoz a b where Id :: Compoz a a Cons :: (Serialize a,b,c) => (b -> c) -> Compoz a b -> Compoz a c -- this value needs to be written out chain = layer20 `Cons` layer 19 ... runLayers (Layer a b) = deserialize . (run . takeC (b-a) . dropC a $ chain) . serialize takeC :: Int -> Compoz a b -> (exists c. Compoz a c) dropC :: Int -> Compoz a b -> (exists c. Compoz c b) run :: Compoz a b -> (a -> b) Of course, you will have to wrestle with the existential types for takeC and dropC a bit, but that shouldn't be much of a problem. For instance, you can fuse these functions into runLayers and hide the existential types somewhere in the recursion. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

takeC :: Int -> Compoz a b -> (exists c. Compoz a c) dropC :: Int -> Compoz a b -> (exists c. Compoz c b)
What does 'exists' means? To create a rank-2 type can't you use:
takeC :: Int -> Compoz a b -> (forall c. Compoz a c)
??
2011/2/28 Heinrich Apfelmus
Yitzchak Gale wrote:
You have written a large software system in Haskell. Wishing to play to Haskell's strength, you have structured your system as a series of composable layers. So you have data types
Layer1, Layer2, ...
and functions
layer2 :: Layer1 -> Layer2 layer3 :: Layer2 -> Layer3 ....
etc.
Of course you'll want to be able to run any range of the layers separately, at least for testing and debugging purposes if not for the functionality itself.
So your UI module (command line or whatever) that launches your application provides a data type
data Layers = Layers Int Int
that indicates which layers to run, and functions
deserialize1 :: L.ByteString -> Layer1 deserialize2 :: L.ByteString -> Layer2 ....
serialize1 :: Layer1 -> L.ByteString serialize2 :: Layer2 -> L.ByteString ....
etc.
Now you need a function
runLayers :: Layers -> L.ByteString -> L.ByteString
so that the effect is for example
runLayers (Layers 4 6) = serialize6 . layer6 . layer5 . deserialize4
[..]
What is the best way to write runLayers? Feel free to change the details of the above design, as long as it meets the functionality requirements expressed.
Solution: compose all the functions, but do not use the standard function composition (.) to do that. Instead, make a new data type with composition as constructor. This way, you can inspect the composition afterwards and run only parts of it.
Solution, put differently: Make a type-safe list of the whole chain of functions. Then, the runLayers function throws away everything outside the range and composes what is left.
Here a rough sketch of what I have in mind:
data Compoz a b where Id :: Compoz a a Cons :: (Serialize a,b,c) => (b -> c) -> Compoz a b -> Compoz a c
-- this value needs to be written out chain = layer20 `Cons` layer 19 ...
runLayers (Layer a b) = deserialize . (run . takeC (b-a) . dropC a $ chain) . serialize
takeC :: Int -> Compoz a b -> (exists c. Compoz a c) dropC :: Int -> Compoz a b -> (exists c. Compoz c b)
run :: Compoz a b -> (a -> b)
Of course, you will have to wrestle with the existential types for takeC and dropC a bit, but that shouldn't be much of a problem. For instance, you can fuse these functions into runLayers and hide the existential types somewhere in the recursion.
Regards, Heinrich Apfelmus
-- http://apfelmus.nfshost.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 2/28/11 6:01 AM, Yves Parès wrote:
takeC :: Int -> Compoz a b -> (exists c. Compoz a c) dropC :: Int -> Compoz a b -> (exists c. Compoz c b)
What does 'exists' means? To create a rank-2 type can't you use:
takeC :: Int -> Compoz a b -> (forall c. Compoz a c)
??
For any A and T, foo :: A -> (forall b. T b) is identical to foo :: forall b. (A -> T b) More technically, they're isomorphic--- in System F or any other gritty language that makes you explicitly pass types around. Since Haskell handles type passing implicitly, the isomorphism looks like identity. -- Live well, ~wren

Yves Parès wrote:
takeC :: Int -> Compoz a b -> (exists c. Compoz a c) dropC :: Int -> Compoz a b -> (exists c. Compoz c b)
What does 'exists' means? To create a rank-2 type can't you use:
takeC :: Int -> Compoz a b -> (forall c. Compoz a c)
??
Ah, (exists c. Compoz a c) means "There exists a type c such that the whole thing has type Compoz a c ". What you describe would be the type "For any type c the whole thing can be treated as having type Compoz a c " which is something entirely different. The point is that in the former version, the function takeC determines what the type c should be and the caller has no clue what it is. In the latter version, the function that calls takeC can choose which type it should be. What I wrote is *not* legal Haskell. (At least not in GHC. If I remember correctly, the EHC from Utrecht supports first-class existential quantification ). You have to encode it in some way, for instance with a data type data Exists f = forall c . Exists (f c) takeC :: Int -> Compoz a b -> Exists (Compoz a) Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

Thanks to everyone for the nice solutions to this puzzle, here and on reddit: http://www.reddit.com/r/haskell/comments/fu6el/a_practical_haskell_puzzle/ There were two basic approaches. One was to use GADTs and higher-rank types to reduce the amount of type trickery needed. One nice example is apfelmus' solution here in this thread, and several people on reddit suggested using use thrists package: http://hackage.haskell.org/package/thrist The other approach is to use some kind of generics. In any case, there does not appear to be any reasonable way to handle this simple and common situation in Haskell without extensions. I challenge the Haskell community to add these extensions to the Haskell standard in Haskell 2012! Lennart proposed using type-level numbers and reification, but I'm not sure about the full details of that solution. Does it use Haskell extensions, and if so, which ones? Thanks, Yitz

From: Yitzchak Gale
To: haskell-cafe@haskell.org Cc: Heinrich Apfelmus
; Lennart Augustsson Sent: Wed, March 2, 2011 9:45:15 AM Subject: Re: [Haskell-cafe] A practical Haskell puzzle Thanks to everyone for the nice solutions to this puzzle, here and on reddit:
http://www.reddit.com/r/haskell/comments/fu6el/a_practical_haskell_puzzle/
It seems nobody has provided a simple H98 solution. I misread your question as asking for the composition of arbitrary type-compatible subsets of the layers, like
runCompose [1,7,4,3] input
if it happens fun1 . fun7 . fun4 . fun3 is well typed. This is not easy to do without Dynamic. Now I see you just want contiguous layers, which is easy enough in H98. This code produces and uses a table of all allowed combinations. I think this makes it easier to understand why the code works (and is H98). It's just as easy to make a direct version that produces one requested composition in linear time, so I haven't worried whether lazy evaluation of this table works nicely. \begin{code} runLayers :: Int -> Int -> String -> String runLayers n m = (table !! (n-1)) !! (m-n) table :: [[String -> String]] table = close (extend fun1 (extend fun2 (extend fun3 (extend fun4 seed)))) \end{code} Here are some examples with this sequence of layers and transformations (exact type definition and function definitions at the end of the message). Layer1: (Int,Int) --(uncurry(+))--> Layer2: Int --(\x -> if even x then Left x else Right x)--> Layer3: Either Int Int --(either (2*) negate)--> Layer4: Int --(`quotRem`14)--> Layer5: (Int,Int) *Main> read (runLayers 2 4 (show (Layer2 "X" 12))) :: Layer4 Layer4 "fun3(fun2(X))" 24 *Main> read (runLayers 4 5 (show (Layer4 "Y" 15))) :: Layer5 Layer5 "fun4(Y)" (1,1) *Main> read (runLayers 1 5 (show (Layer1 "fullStack" (5,6)))) :: Layer5 Layer5 "fun4(fun3(fun2(fun1(fullStack))))" (0,-11) The table also include trivial slices, which might be useful to check the serialization: *Main> read (runLayers 3 3 "(Layer3 \"X\" (Left (12)))") :: Layer3 Layer3 "X" (Left 12) The key observation is that if all compositions of functions are followed by the appropriate initialization function, then all the functions starting at the same layer have the same type. With four layers, show . show . fun34 show . fun45 . fun34 all have type Layer3 -> String The table construction uses a type \begin{code} data Layered a = Layered [a -> String] [[String -> String]] \end{code} which stores all sequences beginning at layer "a" with the uniform type [a -> String], and already has all strictly later sequences in the table [[String->String]]. A partial sequences can be extended by precomposing another function, or converted to the unform type by precomposing the deserialization function. To ensure only one type parameter is exposed at a time, the extend function combines both steps. \begin{code} extend :: (Show a, Read b) => (a -> b) -> Layered b -> Layered a extend f (Layered gs tails) = Layered (show:[g . f | g <- gs]) ([g . read | g <- gs]:tails) \end{code} The final step just closes partial sequences to produce one table, and the seed is a trivial table. \begin{code} close :: (Read a) => Layered a -> [[String -> String]] close (Layered fs tails) = [f . read | f <- fs]:tails seed :: (Show a) => Layered a seed = Layered [show] [] \end{code} Exact definition of the layer types. \begin{code} data Layer1 = Layer1 String (Int,Int) deriving (Read, Show) data Layer2 = Layer2 String Int deriving (Read, Show) data Layer3 = Layer3 String (Either Int Int) deriving (Read, Show) data Layer4 = Layer4 String Int deriving (Read, Show) data Layer5 = Layer5 String (Int,Int) deriving (Read, Show) \end{code} \begin{code} fun1 (Layer1 s x) = Layer2 ("fun1("++s++")") (uncurry (+) x) fun2 (Layer2 s x) = Layer3 ("fun2("++s++")") (if even x then Left x else Right x) fun3 (Layer3 s x) = Layer4 ("fun3("++s++")") (either (2*) negate x) fun4 (Layer4 s x) = Layer5 ("fun4("++s++")") (x `quotRem` 14) \end{code}

Brandon Moore wrote:
This code produces and uses a table of all allowed combinations. I think this makes it easier to understand why the code works (and is H98). It's just as easy to make a direct version that produces one requested composition in linear time, so I haven't worried whether lazy evaluation of this table works nicely.
Given that you are allowing serializing and deserializing at every step, you're right that it's not hard to do it in Haskell 98. I'm not convinced that you gain anything by building that big table though. Anyway, my idea was to try to find a solution that does not require the runtime cost of serializing and deserializing at every step just to solve a type problem. Sorry I didn't make that more clear in my statement of the problem. Thanks, Yitz

From: Yitzchak Gale
Brandon Moore wrote:
This code produces and uses a table of all allowed combinations. I think this makes it easier to understand why the code works (and is H98). It's just as easy to make a direct version that produces one requested composition in linear time, so I haven't worried whether lazy evaluation of this table works nicely.
Given that you are allowing serializing and deserializing at every step, you're right that it's not hard to do it in Haskell 98. I'm not convinced that you gain anything by building that big table though.
Anyway, my idea was to try to find a solution that does not require the runtime cost of serializing and deserializing at every step just to solve a type problem. Sorry I didn't make that more clear in my statement of the problem.
My solution does not serialize and deserialize between every pair of layers. The functions in the table have the form show . layer4 . layer3 . layer2 . read not show . layer4 . read . show . layer3 . read . show . layer2 . read I assume the first is fine, otherwise why mention serialization functions. The code can also be transformed to avoid the table construction and produce the requested function in linear time, but the intermediate types seem much more confusing. Brandon

Brandon Moore wrote:
My solution does not serialize and deserialize between every pair of layers.
Ahhh, I see! Sorry I didn't look closely enough the first time. Yes, this is a very nice Haskell 98 solution!
This code produces and uses a table of all allowed combinations. I think this makes it easier to understand why the code works (and is H98).
I'm not sure I understand why that is so.
It's just as easy to make a direct version that produces one requested composition in linear time, so I haven't worried whether lazy evaluation of this table works nicely.
Well, for the table solution to really qualify, that would need to work out. Otherwise, I'm not sure it's much better than just building that many boilerplate definitions in some automated way and compiling them. Could you please elaborate a bit more on what you mean by the "direct version"? Thanks, Yitz

Okay thanks I got the difference between both.
The 'exists' syntax seems very useful. Is it planned to be added to GHC in a
near future?
2011/2/28 Heinrich Apfelmus
Yves Parès wrote:
takeC :: Int -> Compoz a b -> (exists c. Compoz a c)
dropC :: Int -> Compoz a b -> (exists c. Compoz c b)
What does 'exists' means? To create a rank-2 type can't you use:
takeC :: Int -> Compoz a b -> (forall c. Compoz a c)
??
Ah, (exists c. Compoz a c) means "There exists a type c such that the whole thing has type Compoz a c ".
What you describe would be the type "For any type c the whole thing can be treated as having type Compoz a c " which is something entirely different.
The point is that in the former version, the function takeC determines what the type c should be and the caller has no clue what it is. In the latter version, the function that calls takeC can choose which type it should be.
What I wrote is *not* legal Haskell. (At least not in GHC. If I remember correctly, the EHC from Utrecht supports first-class existential quantification ). You have to encode it in some way, for instance with a data type
data Exists f = forall c . Exists (f c) takeC :: Int -> Compoz a b -> Exists (Compoz a)
Regards, Heinrich Apfelmus
-- http://apfelmus.nfshost.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Yves Parès wrote:
Okay thanks I got the difference between both. The 'exists' syntax seems very useful. Is it planned to be added to GHC in a near future?
Probably not. But once GADTs become more prominent, there might be pressure to add first-class existential types to the language. Note that GHC has long supported existential types, just not the explicit syntax. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On 2/28/11 2:43 AM, Yitzchak Gale wrote:
You have written a large software system in Haskell. Wishing to play to Haskell's strength, you have structured your system as a series of composable layers. So you have data types
Layer1, Layer2, ...
and functions
layer2 :: Layer1 -> Layer2 layer3 :: Layer2 -> Layer3 ...
etc.
Assuming you actually name them Layer1, Layer2, etc, or use any other regular naming scheme, you can break apart the names and use typeclasses to help out: type family Layer :: * -> * data Z data S n class Layerable n where layer :: Layer n -> Layer (S n) Then it's just a matter of getting the right number of them, a la lifting through monad transformer stacks. Of course, from here it's not that hard to add in some type hackery to do the lifting for you, a la "Data types a la Carte"[1]. It's not the cleanest thing ---there's a good deal of boilerplate up front--- but once it's set up, it should Just Work(tm). [1] http://www.cs.nott.ac.uk/~wss/Publications/DataTypesALaCarte.pdf -- Live well, ~wren

On Mon, Feb 28, 2011 at 12:41 PM, wren ng thornton
On 2/28/11 2:43 AM, Yitzchak Gale wrote:
You have written a large software system in Haskell. Wishing to play to Haskell's strength, you have structured your system as a series of composable layers. So you have data types
Layer1, Layer2, ...
and functions
layer2 :: Layer1 -> Layer2 layer3 :: Layer2 -> Layer3 ...
etc.
Assuming you actually name them Layer1, Layer2, etc, or use any other regular naming scheme, you can break apart the names and use typeclasses to help out:
type family Layer :: * -> * data Z data S n
class Layerable n where layer :: Layer n -> Layer (S n)
Then it's just a matter of getting the right number of them, a la lifting through monad transformer stacks. Of course, from here it's not that hard to add in some type hackery to do the lifting for you, a la "Data types a la Carte"[1]. It's not the cleanest thing ---there's a good deal of boilerplate up front--- but once it's set up, it should Just Work(tm).
I was thinking something like: class IsNat n => LayerID n where data Layer n :: * runLayer :: LayerID (S n) => Layer n -> Layer (S n) serialize :: Layer n -> L.ByteString deserialize :: L.ByteString -> Layer n and then use a reifyIntegral-esque function (a la the reflection and type-level packages) to reify two numbers up to the type level and run all of the runLayers in between the two, but where I ran into trouble was that there's no guarantee of there being a LayerID instance for any given n; and at that point you either need the non-existent 'do-this-if-there's-an-instance-and-do-that-otherwise' construction, or perhaps you can hack something with OverlappingInstances which moves the whole thing from sophisticated-but-clean-oleggery to kind of ugly. Is there any cleaner way around this?
[1] http://www.cs.nott.ac.uk/~wss/Publications/DataTypesALaCarte.pdf
-- Live well, ~wren
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Work is punishment for failing to procrastinate effectively.

There were a number of emails discussing what a type-safe list solution would like look. This was the approach that first came to mind when I read your email (but I've had my head in Agda lately) http://hpaste.org/44469/software_stack_puzzle I've written up a minimal working example of this approach for those that are curious. As for the Haskell98 approach, I'd love to see a solution that didn't require deserialization/serialization at each layer boundary. This sounds like a case for the techniques used in list fusion, but GHC RULES are hardly Haskell98 :-) I'd also like to avoid cramming all of the possible layer input and output types into one giant ADT in such a solution. -- Eric Mertens

Eric Mertens wrote:
(but I've had my head in Agda lately)
Indeed, coming across this problem tempted me to abandon the real world and take refuge in Agda.
Wow, so simple, and no higher-rank types! This is the best solution yet. I am now truly in awe of the power of GADTs. Thanks, Yitz

Yitzchak Gale wrote:
Eric Mertens wrote:
(but I've had my head in Agda lately)
Indeed, coming across this problem tempted me to abandon the real world and take refuge in Agda.
Wow, so simple, and no higher-rank types! This is the best solution yet. I am now truly in awe of the power of GADTs.
Just for reference, here a full version of my solution http://hpaste.org/44503/software_stack_puzzle_annotat It's almost the same as Eric's solution except that he nicely fused the dropC and takeC functions into runLayers , thereby eliminating the need for existential quantification. However, note that GADTs subsume higher-rank types. With data Exists f where Exists :: f a -> Exists f you can always encode them as exists a. f a = Exists f forall a. f a = (exists a. f a -> c) -> c = (Exists f -> c) -> c Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com
participants (7)
-
Brandon Moore
-
Eric Mertens
-
Gábor Lehel
-
Heinrich Apfelmus
-
wren ng thornton
-
Yitzchak Gale
-
Yves Parès