
Hi Adam, thanks for your inputs.
2013/12/31 adam vogt
Hello Lucas,
Am I correct to say that laborantin only does full factorial experiments? Perhaps there is a straightforward way for users to specify which model parameters should be confounded in a fractional factorial design. Another extension would be to move towards sequential designs, where the trials to run depend on the results so far. Then more time is spent on the "interesting" regions of the parameter space.
Actually, the parameters specified in the DSL are "indicative" values for a full-factorial default. At this point, a command-line handler is responsible for exploring the parameter space and executing scenarios. This command-line handler has a way to specify fractional factorial designs by evaluating a query like: "(@sc.param 'foo' > @sc.param 'bar') and @sc.param 'baz' in [1,2,3,'toto']" . This small query language was my first attempt at "expression parsing and evaluation" and the code might be ugly, but it works and fits most of my current needs. Bonus: with this design, the algorithm to "explore" the satisfiable parameter space is easy to express. One direction to enrich this small query language would be to express that a parameter takes a continuous value in a range or should fullfill a boolean test function. Then we could use techniques such as rapidly exploring random trees to explore "exotic feasability regions". Another direction to improve the query language is to require ScenarioDescriptions to have a sort of "cost/fitness function" so that we can later build a parameter-space explorer that performs an optimization. We could even extend the query language to bind a parameter to a value which optimize another experiment.
I think getVar/param could be re-worked to give errors at compile time. Now you get a runtime error if you typo a parameter or get the type wrong. Another mistake is to include parameters in the experiment that do not have any effect on the `run` action, unless those parameters are there for doing replicates.
Those might be addressed by doing something like:
a <- parameter "destination" $ do ... run $ print =<< param a
Where the types are something like:
param :: Data.Tagged.Tagged a Text -> M a values :: [T a] -> M (Tagged a Text) str :: Text -> T Text num :: Double -> T Double
with M being whatever state monad you currently use, and param does the same thing it always has, except now it knows which type you put in the values list, and it cannot be called with any string. The third requirement might be met by requiring -fwarn-unused-matches.
That's one thing I am parted about. From my experience, it is sometimes handy to branch on whether a value is a number or a string (e.g., to say things like 1, 2, 3, or "all"). Somehow, tagged values do not prevent this either. Similarly, I don't know whether I should let users specify any type for their ParameterDescription at the cost of writing serializers/deserializers boilerplate (although we could provide some default useful types as it is the case now). An alternative strategy is to change your type Step, into an algebraic
data type with a function to convert it into what it is currently. Before the experiment happens, you can have a function go through that data to make sure it will succeed with it's getVar/param. This is called a deep embedding: http://www.haskell.org/haskellwiki/Embedded_domain_specific_language.
That can be an idea, I didn't go that far yet, but I'll keep an eye on it. Best wishes for this happy new year, --Lucas Regards,
Adam
On Mon, Dec 23, 2013 at 4:27 AM, lucas di cioccio
wrote: Dear all,
I am happy to announce Laborantin. Laborantin is a Haskell library and DSL for running and analyzing controlled experiments.
Repository: https://github.com/lucasdicioccio/laborantin-hs Hackage page: http://hackage.haskell.org/package/laborantin-hs
Laborantin's opinion is that running proper experiments is a non-trivial and often overlooked problem. Therefore, we should provide good tools to assist experimenters. The hope is that, with Laborantin, experimenters will spend more time on their core problem while racing through the menial tasks of editing scripts because one data point is missing in a plot. At the same time, Laborantin is also an effort within the broad open-science movement. Indeed, Laborantin's DSL separates boilerplate from the actual experiment implementation. Thus, Laborantin could reduce the friction for code and data-reuse.
One family of experiments that fit well Laborantin are benchmarks with tedious setup and teardown procedures (for instance starting, configuring, and stopping remote machines). Analyses that require measurements from a variety of data points in a multi-dimensional parameter space also fall in the scope of Laborantin.
When using Laborantin, the experimenter:
* Can express experimental scenarios using a readable and familiar DSL. This feature, albeit subjective, was confirmed by non-Haskeller colleagues. * Saves time on boilerplate such as writing command-line parsers or encoding dependencies between experiments and analysis results in a Makefile. * Benefits from auto-documentation and result introspection features when one comes back to a project, possibly months or weeks later. * Harnesses the power of Haskell type-system to catch common errors at compile time
If you had to read one story to understand the pain points that Laborantin tries to address, it should be Section 5 of "Strategies for Sound Internet Measurement" (V. Paxson, IMC 2004).
I'd be glad to take question and comments (or, even better, code reviews and pull requests).
Kind regards, --Lucas DiCioccio (@lucasdicioccio on GitHub/Twitter)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe