How to implement a source-sink pattern

Hi! I'm trying to implement a source-sink type of pattern where there are a number of sinks, each connected to exactly one source, and a number of sources, each connected to zero or more sinks. The program consists of some modules, each defining their own sources and sinks. To illustrate this, here's what this would look like in C: /* global.h */ struct source { char *state; /* some more fields */ } struct sink { struct source *source; char *state; /* some more fields */ } struct sink **get_sinks_for_source(struct source *source); /* module_a.c */ struct source a_source, another_source; struct sink foo, bar, baz; ... foo.source = &a_source; ... Since getting the list of sinks for a source is a common operation, I'd probably define some kind of reverse map which is updated when a sink is remapped to a new source. I tried to rebuild this in Haskell, but the result is ridiculously complicated. Since I can't use pointers, I had to define an ordinal type class to enumerate the sources and sinks and use this to look up the actual data from the world state. But then I couldn't define the Sink type properly as I can't do: data SinkInfo = Source a => SinkInfo { sinkSource :: a, sinkState :: String } I have to add the source type as a type attribute, leading to a world state with as many parameters as there are sinks. Also, I couldn't figure out how to implement a function like sinksForSource :: Source a => WorldState p q r -> a -> [Sink b => b] since the source types won't match the type of the lookup key, and the sinks can be from different modules. Now here is the actual code (comments indicate where I intend to split it into individual files): {- Global.hs -} data SourceInfo = SourceInfo { sourceState :: String } class Eq a => Source a where getSourceInfo :: WorldState p q r -> a -> SourceInfo data SinkInfo a = SinkInfo { sinkSource :: a, sinkState :: String } class Sink a where getSinkInfo :: Source x => WorldState x x x -> a -> SinkInfo x -- should allow different arguments to WorldState {- ModuleA.hs -} data ModuleASource = ASource | AnotherSource deriving (Eq) instance Source ModuleASource where getSourceInfo world ASource = aSource $ moduleAState world getSourceInfo world AnotherSource = anotherSource $ moduleAState world data ModuleASink = Foo | Bar | Baz instance Sink ModuleASink where getSinkInfo world Foo = foo $ moduleAState world getSinkInfo world Bar = bar $ moduleAState world getSinkInfo world Baz = baz $ moduleAState world data ModuleAState p q r = ModuleAState { aSource :: SourceInfo, anotherSource :: SourceInfo, foo :: SinkInfo p, bar :: SinkInfo q, baz :: SinkInfo r } sinksForSourceInModuleA :: Source x => ModuleAState x x x -> x -> [ModuleASink] -- should allow different arguments to ModuleAState and return Sink b => [b] sinksForSourceInModuleA (ModuleAState _ _ foo bar baz) source = (if sinkSource foo == source then [Foo] else []) ++ (if sinkSource bar == source then [Bar] else []) ++ (if sinkSource baz == source then [Baz] else []) {- Main.hs -} data WorldState p q r = WorldState { moduleAState :: ModuleAState p q r } initState :: WorldState ModuleASource ModuleASource ModuleASource initState = WorldState $ ModuleAState (SourceInfo "a source init state") (SourceInfo "another source init state") (SinkInfo ASource "foo init state") (SinkInfo ASource "bar init state") (SinkInfo AnotherSource "baz init state") remapBar :: WorldState p q r -> WorldState p ModuleASource r remapBar (WorldState a) = WorldState $ a { bar = SinkInfo AnotherSource (sinkState $ bar a) } sinksForSource :: Source x => WorldState x x x -> x -> [ModuleASink] -- should allow different arguments to ModuleAState and return Sink b => [b] sinksForSource (WorldState a) source = sinksForSourceInModuleA a source main :: IO () main = let before = initState after = remapBar before in do putStrLn $ "Number of sinks for another source before: " ++ (show $ length $ sinksForSource before AnotherSource) putStrLn $ "Number of sinks for another source after: " ++ (show $ length $ sinksForSource after AnotherSource) There are some problems with this code: * I couldn't figure out how to resolve the circular references which are created by splitting this into individual files. * Source and sink ordinals can't be mixed between modules, so I wouldn't be able to add a module B anyway. * Looking up the source/sink states by ordinal keys is kind of cumbersome but works. * To get the list of sinks connected to a given source, the whole world state has to be polled. I expect this to be grossly inefficient (unless Haskell does some magic here) but I'm not sure how to add a cache in a consistent way. I can't believe this should be so complicated in Haskell, so I guess I'm trying to do this in an un-Haskell-ish way, or maybe there's something obvious I haven't seen. I'd be happy about any suggestions. Roland

Roland Lutz
Hi!
I'm trying to implement a source-sink type of pattern where there are a number of sinks, each connected to exactly one source, and a number of sources, each connected to zero or more sinks. The program consists of some modules, each defining their own sources and sinks. To illustrate this, here's what this would look like in C:
Hey Roland, So essentially you want a data structure for some kind of bipartite graph. The most haskelly way to do that would probably to define the graph to be simply define the Bipartite graph to be a pair of Maps, and define functions to add/delete nodes and edges to the graph that make sure that the two maps keep in sync. This would give you something like: import qualified Data.Map as M data MySource = MySource { sourceState :: String , % and any other data specific to sources } data MySink = MySink { sinkState :: String, % and whatever else sinks have} data BiGraph src snk = BiGraph { sourceToSinkMap :: M.Map src [snk] , sinkToSourceMap :: M.Map snk src } addEdge :: (src,snk) -> BiGraph src snk -> BiGraph src snk addEdge (src,snk) (BiGraph m1 m2) = BiGraph (M.update src (snk :) m1) (M.insert snk src m2) % make sure to check that snk % does not already occur in % m2 etc. you essentially get your 'sinksForSource' functions for free: sinksForSource :: src -> BiGraph src snk -> [snk] sinksForSource src = M.lookup src . sourceToSinkMap In this model you cannot direclty mix the sources and sinks from different modules. I.e. a 'BiGraph MySource MySink' cannot be used to also store a (MySecondSource,MySecondSink) pairs. If you do want that, you would need some wrapper type that distinguishes between the various 'MySink', 'MySecondSink', versions. Note that instead of building such a graph type yourself you might also just want to use some existing graph library out there (i.e. fgl or so). Hope this helps a bit. -- - Frank

On Wed, 1 Apr 2015, Frank Staals wrote:
So essentially you want a data structure for some kind of bipartite graph.
Yes, with the additional constraint that the vertices in one partite set (the "sinks") each connect to exactly one edge.
The most haskelly way to do that would probably to define the graph to be simply define the Bipartite graph to be a pair of Maps, and define functions to add/delete nodes and edges to the graph that make sure that the two maps keep in sync.
This was actually my first approach, but I couldn't find appropriate key and value types to be stored in the map. Since the vertices are well-known global objects, it doesn't make much sense to store more than a handle here. But how do I connect the handle back to the data structure?
In this model you cannot direclty mix the sources and sinks from different modules. I.e. a 'BiGraph MySource MySink' cannot be used to also store a (MySecondSource,MySecondSink) pairs. If you do want that, you would need some wrapper type that distinguishes between the various 'MySink', 'MySecondSink', versions.
That's one of the points that trouble me. How would such a wrapper look like? I experimented a bit with your code (see below). I noticed that I have to specify "Ord src =>" and "Ord snk =>" in multiple places. Is there a way to state that type arguments for BiGraph always have to be instances of Ord? Roland import qualified Data.List as L import qualified Data.Map as M data BiGraph src snk = BiGraph { sourceToSinkMap :: M.Map src [snk], sinkToSourceMap :: M.Map snk src } deriving Show collectKeys :: Eq a => a -> M.Map k a -> [k] collectKeys a = M.keys . M.filter (== a) applyToPair :: (k -> a) -> k -> (k, a) applyToPair f a = (a, f a) initializeGraph :: Ord src => [src] -> M.Map snk src -> BiGraph src snk initializeGraph srcs m2 = BiGraph (M.fromList $ map (applyToPair $ (flip collectKeys) m2) srcs) m2 updateEdge :: Ord src => Ord snk => (src, snk) -> BiGraph src snk -> BiGraph src snk updateEdge (src, snk) (BiGraph m1 m2) = if M.notMember src m1 then error "updateEdge: invalid source" else if M.notMember snk m2 then error "updateEdge: invalid sink" else let oldsrc = m2 M.! snk in BiGraph (M.adjust (snk :) src $ M.adjust (L.delete snk) oldsrc m1) (M.insert snk src m2) sinksForSource :: Ord src => src -> BiGraph src snk -> [snk] sinksForSource src = (M.! src) . sourceToSinkMap

Roland Lutz
On Wed, 1 Apr 2015, Frank Staals wrote:
So essentially you want a data structure for some kind of bipartite graph.
Yes, with the additional constraint that the vertices in one partite set (the "sinks") each connect to exactly one edge.
The most haskelly way to do that would probably to define the graph to be simply define the Bipartite graph to be a pair of Maps, and define functions to add/delete nodes and edges to the graph that make sure that the two maps keep in sync.
This was actually my first approach, but I couldn't find appropriate key and value types to be stored in the map. Since the vertices are well-known global objects, it doesn't make much sense to store more than a handle here. But how do I connect the handle back to the data structure?
A vertex (source/sink) is not uniquely coupled with a graph; it may be in more than one graph. So, there is no easy way to define a function with type 'Vertex -> Graph'. In other words, you should pass around the graph data structure if you need connectivity information.
In this model you cannot direclty mix the sources and sinks from different modules. I.e. a 'BiGraph MySource MySink' cannot be used to also store a (MySecondSource,MySecondSink) pairs. If you do want that, you would need some wrapper type that distinguishes between the various 'MySink', 'MySecondSink', versions.
That's one of the points that trouble me. How would such a wrapper look like?
Assuming that the set of different sink types is fixed and known at compile time, simply: data Sink = SinkA ModuleA.Sink | SinkB ModuleB.Sink | ... If the sink types are not known then you need either Existential type or something like an open-union. An existentialtype will look something like: data Sink where Sink :: IsASink t => t -> Sink however, if you have a value of type Sink, you cannot recover what exact type of sink it was (i.e. if it was a ModuleA.Sink or a ModuleB.Sink): you only know that it has the properties specified by the IsASink typeclass. With open-unions you should be able to recover the exact type. However that is a bit more complicated. I don't have concrete experience with them myself, so others might be more helpful on that front.
I experimented a bit with your code (see below). I noticed that I have to specify "Ord src =>" and "Ord snk =>" in multiple places. Is there a way to state that type arguments for BiGraph always have to be instances of Ord?
Roland
Depends a bit, if you wish to keep the functions polymorphic in the source an sink types (src and snk) then you have to keep them. If you fill in the concrete types at hand, and you give them Ord instances, then (obviously) you don't have to keep the Ord constraints ;) Last note: If your source and sink types don't have a proper Ordering, you can switch to using (integer) explicit vertexIds and IntMaps. Similar to the API of fgl. However, in that case you have to do the bookkeeping of which vertex has which vertexId yourself.
updateEdge :: Ord src => Ord snk => (src, snk) -> BiGraph src snk -> BiGraph src snk updateEdge (src, snk) (BiGraph m1 m2) =
FWI: you would normally write multiple class constraints like (Ord src, Ord snk) =>, instead of Ord src => Ord snk =>. I'm kind of surprised the latter is still allowed. Regards, -- - Frank

On 04/01/2015 03:31 PM, Roland Lutz wrote:
Hi!
I'm trying to implement a source-sink type of pattern where there are a number of sinks, each connected to exactly one source, and a number of sources, each connected to zero or more sinks. The program consists of some modules, each defining their own sources and sinks. To illustrate this, here's what this would look like in C:
/* global.h */
struct source { char *state; /* some more fields */ }
struct sink { struct source *source; char *state; /* some more fields */ }
struct sink **get_sinks_for_source(struct source *source);
/* module_a.c */
struct source a_source, another_source; struct sink foo, bar, baz;
... foo.source = &a_source; ...
One important thing you didn't state is: which parts of these data structures are immutable? The process of moving any data structure from C to Haskell depends on the answer, but otherwise can usually be done in a relatively mechanical fashion: 1. everything that is immutable should be shifted as deeply as possible, the mutable containers containing immutable ones; 2. map the data structures over: struct to a record, immutable array to an immutable array (or list or map or whatever, depending on the access pattern and performance requirements), mutable array to a mutable array; 3. map immutable non-null pointers to just the data structure they're pointing to, other immutable pointers to Maybe, mutable non-null pointers to STRef, other mutable pointers to STRef Maybe. 4. use runST to hide the whole mess behind a pure interface as much as possible. The result is unlikely to be optimal or elegant, but this process can get you a working implementation in Haskell. Once there, start refactoring the algorithms in and you'll likely be able to simplify both the data structure and the whole program. Take care to start with strong types and they will prevent you from doing anything stupid while refactoring. On a related note: I have no idea what Sink and Source are supposed to be for, but it's possible that pipes and conduits already provide it.
participants (3)
-
Frank Staals
-
Mario Blažević
-
Roland Lutz