[Haskell-cafe] Re: What do _you_ want to see in FGL?

10 May 2010

      Ivan Lazar Miljenovic wrote:
...
Henning Thielemann writes:
...
Recently I wrote cabal-sort using FGL
  http://hackage.haskell.org/package/cabal-sort
It sorts cabal packages topologically according to their
dependencies. However, I was neither happy with the way FGL currently
works, nor with the way I proposed recently (splitting into unlabelled
and labelled graphs). I like to use the package name as node identifier.
I do not need any label, but I need a node type different from Int.
With current FGL I need to maintain a Map PkgName Int. Would it be
sensible to generalize the Node type to any Ord type or does FGL use
optimizations specific to Int? In another example I used FGL for
finding all topological orderings of a set of database
transactions. In this case I used an enumeration type as node
type. Are there other applications for alternative Node types?
We're considering doing this.
Pros for allowing you to use a custom node type:
* Matches your data better
* No need for extra lookup maps when converting your data to FGL form
Cons:
* Makes type-sigs uglier/more verbose
* Restricts the ability to optimise
Using Int gives us a fixed-size data type with known good comparison
performance and with a range that should suit most purposes.
I have the same gripe as Henning, though I'm not sure I concur with his
proposal.

Here a snippet from a quick & dirty 'make' implementation that I use for
building my website:

    data Rule = Rule { ins :: [FilePath], out :: FilePath,
                       effect :: IO () }

    rules2Graph :: [Rule] -> G.Gr (IO ()) ()
    rules2Graph rules = G.mkGraph nodes' edges'
        where
        nodes = [(out r, conditionally (effect r) (ins r) (out r))
                 | r <- rules]
        edges = [(i, out r, ()) | r <- rules, i <- ins r,
                                  i `Map.member` nodeMap]
                                  -- ignore source nodes

        nodeMap = Map.fromList nodes
        index k = Map.findIndex k nodeMap
        nodes'  = map (\(a,b) -> (index a, b)) nodes
        edges'  = map (\(a,b,c) -> (index a, index b, c)) edges

The nodes are file paths, labeled with a corresponding IO action to
create the file. The nodes are created from a list of rules that specify
how to create an output file from several input files.

As you can see, I had to use  Data.Map  to convert file paths into node
indexes. Ideally, the  Data.Graph.Inductive.NodeMap  module should be
used for that, but after some frustration, I found it completely
unsuitable for this task because it insists on using the graph label as
node identifier.

I am particularly unhappy about the definitions of  nodes'  and  edges'
, the clever use of  Map.findIndex  to translate indexes while keeping
track of a label and the need for mapping the indexes myself.

I'm not sure what the right solution is, but I think it definitely
involves catering for different node types. For instance, the library
could operate on a type

    newtype Graph node a b = Graph (Gr a b, Data.Map.Map Int node)

or it could offer a more useful  NodeMap  type and make the  Node  type
abstract. Some systematic and simple abstractions to manage nodes is
needed anyway.

Also, hard-coding  Node = Int  feels like the wrong kind of flexibility:
the user is able to do anything with it, just in case the library forgot
some important functionality. Which is exactly what happened in my case
when I used  Map.findIndex . I prefer the library to get it right.

PS: While we're at it, I think  newNodes  should return an infinite list
of  Node  instead of requiring a fixed number to be specified in advance?

Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com