[ANN] Laborantin: experimentation framework

Dear all, I am happy to announce Laborantin. Laborantin is a Haskell library and DSL for running and analyzing controlled experiments. Repository: https://github.com/lucasdicioccio/laborantin-hs Hackage page: http://hackage.haskell.org/package/laborantin-hs Laborantin's opinion is that running proper experiments is a non-trivial and often overlooked problem. Therefore, we should provide good tools to assist experimenters. The hope is that, with Laborantin, experimenters will spend more time on their core problem while racing through the menial tasks of editing scripts because one data point is missing in a plot. At the same time, Laborantin is also an effort within the broad open-science movement. Indeed, Laborantin's DSL separates boilerplate from the actual experiment implementation. Thus, Laborantin could reduce the friction for code and data-reuse. One family of experiments that fit well Laborantin are benchmarks with tedious setup and teardown procedures (for instance starting, configuring, and stopping remote machines). Analyses that require measurements from a variety of data points in a multi-dimensional parameter space also fall in the scope of Laborantin. When using Laborantin, the experimenter: * Can express experimental scenarios using a readable and familiar DSL. This feature, albeit subjective, was confirmed by non-Haskeller colleagues. * Saves time on boilerplate such as writing command-line parsers or encoding dependencies between experiments and analysis results in a Makefile. * Benefits from auto-documentation and result introspection features when one comes back to a project, possibly months or weeks later. * Harnesses the power of Haskell type-system to catch common errors at compile time If you had to read one story to understand the pain points that Laborantin tries to address, it should be Section 5 of "Strategies for Sound Internet Measurement" (V. Paxson, IMC 2004). I'd be glad to take question and comments (or, even better, code reviews and pull requests). Kind regards, --Lucas DiCioccio (@lucasdicioccio on GitHub/Twitter)

Hi Lucas,
In connection with your work on Laborantin, you may be interested in our
papers:
Braincurry: A domain-specific language for integrative neuroscience
http://www2.le.ac.uk/departments/biology/research/neuroscience/matheson-neur...
A formal mathematical framework for physiological observations, experiments
and analyses.
http://rsif.royalsocietypublishing.org/content/9/70/1040.long
I found it difficult to excite experimental biologists about the benefit of
adopting experiment description languages. I am now concentrating on a
functional language for statistical data analysis - see
https://bayeshive.com
Tom
On 23 December 2013 09:27, lucas di cioccio
Dear all,
I am happy to announce Laborantin. Laborantin is a Haskell library and DSL for running and analyzing controlled experiments.
Repository: https://github.com/lucasdicioccio/laborantin-hs Hackage page: http://hackage.haskell.org/package/laborantin-hs
Laborantin's opinion is that running proper experiments is a non-trivial and often overlooked problem. Therefore, we should provide good tools to assist experimenters. The hope is that, with Laborantin, experimenters will spend more time on their core problem while racing through the menial tasks of editing scripts because one data point is missing in a plot. At the same time, Laborantin is also an effort within the broad open-science movement. Indeed, Laborantin's DSL separates boilerplate from the actual experiment implementation. Thus, Laborantin could reduce the friction for code and data-reuse.
One family of experiments that fit well Laborantin are benchmarks with tedious setup and teardown procedures (for instance starting, configuring, and stopping remote machines). Analyses that require measurements from a variety of data points in a multi-dimensional parameter space also fall in the scope of Laborantin.
When using Laborantin, the experimenter:
* Can express experimental scenarios using a readable and familiar DSL. This feature, albeit subjective, was confirmed by non-Haskeller colleagues. * Saves time on boilerplate such as writing command-line parsers or encoding dependencies between experiments and analysis results in a Makefile. * Benefits from auto-documentation and result introspection features when one comes back to a project, possibly months or weeks later. * Harnesses the power of Haskell type-system to catch common errors at compile time
If you had to read one story to understand the pain points that Laborantin tries to address, it should be Section 5 of "Strategies for Sound Internet Measurement" (V. Paxson, IMC 2004).
I'd be glad to take question and comments (or, even better, code reviews and pull requests).
Kind regards, --Lucas DiCioccio (@lucasdicioccio on GitHub/Twitter)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi Tom,
Thanks for the pointers.
It is interesting to see that Braincurry and Laborantin have similar
designs although we come from very different application domains. You've
picked paths that I was not sure to explore (e.g., have experiment
parameters be a parameterizable datatype rather than a value in a
pre-defined datatype).
I didn't think about enabling algebraic composition of "experiments". It
looks like I can incorporate this idea in Laborantin too as a way to
"combine" setup/run/teardown hooks. I'll definitely have a second look at
Braincurry but first I'll have to read the 2nd paper.
One thing I really would like to support is a way to "inject experiments"
into another system and run experiments "live". For instance, A/B testing
web pages in a Warp application.
BayesHive looks very nice! congrats.
Enjoy a nice year 2014 and best wishes,
--Lucas
2013/12/30 Tom Nielsen
Hi Lucas,
In connection with your work on Laborantin, you may be interested in our papers:
Braincurry: A domain-specific language for integrative neuroscience
http://www2.le.ac.uk/departments/biology/research/neuroscience/matheson-neur...
A formal mathematical framework for physiological observations, experiments and analyses. http://rsif.royalsocietypublishing.org/content/9/70/1040.long
I found it difficult to excite experimental biologists about the benefit of adopting experiment description languages. I am now concentrating on a functional language for statistical data analysis - see https://bayeshive.com
Tom
On 23 December 2013 09:27, lucas di cioccio
wrote: Dear all,
I am happy to announce Laborantin. Laborantin is a Haskell library and DSL for running and analyzing controlled experiments.
Repository: https://github.com/lucasdicioccio/laborantin-hs Hackage page: http://hackage.haskell.org/package/laborantin-hs
Laborantin's opinion is that running proper experiments is a non-trivial and often overlooked problem. Therefore, we should provide good tools to assist experimenters. The hope is that, with Laborantin, experimenters will spend more time on their core problem while racing through the menial tasks of editing scripts because one data point is missing in a plot. At the same time, Laborantin is also an effort within the broad open-science movement. Indeed, Laborantin's DSL separates boilerplate from the actual experiment implementation. Thus, Laborantin could reduce the friction for code and data-reuse.
One family of experiments that fit well Laborantin are benchmarks with tedious setup and teardown procedures (for instance starting, configuring, and stopping remote machines). Analyses that require measurements from a variety of data points in a multi-dimensional parameter space also fall in the scope of Laborantin.
When using Laborantin, the experimenter:
* Can express experimental scenarios using a readable and familiar DSL. This feature, albeit subjective, was confirmed by non-Haskeller colleagues. * Saves time on boilerplate such as writing command-line parsers or encoding dependencies between experiments and analysis results in a Makefile. * Benefits from auto-documentation and result introspection features when one comes back to a project, possibly months or weeks later. * Harnesses the power of Haskell type-system to catch common errors at compile time
If you had to read one story to understand the pain points that Laborantin tries to address, it should be Section 5 of "Strategies for Sound Internet Measurement" (V. Paxson, IMC 2004).
I'd be glad to take question and comments (or, even better, code reviews and pull requests).
Kind regards, --Lucas DiCioccio (@lucasdicioccio on GitHub/Twitter)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

This looks really cool!
Cheers,
Corey
-Corey O'Connor
coreyoconnor@gmail.com
http://corebotllc.com/
On Mon, Dec 23, 2013 at 1:27 AM, lucas di cioccio wrote: Dear all, I am happy to announce Laborantin. Laborantin is a Haskell library and DSL
for
running and analyzing controlled experiments. Repository: https://github.com/lucasdicioccio/laborantin-hs
Hackage page: http://hackage.haskell.org/package/laborantin-hs Laborantin's opinion is that running proper experiments is a non-trivial
and
often overlooked problem. Therefore, we should provide good tools to assist
experimenters. The hope is that, with Laborantin, experimenters will spend
more
time on their core problem while racing through the menial tasks of editing
scripts because one data point is missing in a plot. At the same time,
Laborantin is also an effort within the broad open-science movement.
Indeed,
Laborantin's DSL separates boilerplate from the actual experiment
implementation. Thus, Laborantin could reduce the friction for code and
data-reuse. One family of experiments that fit well Laborantin are benchmarks with
tedious
setup and teardown procedures (for instance starting, configuring, and
stopping
remote machines). Analyses that require measurements from a variety of data
points in a multi-dimensional parameter space also fall in the scope of
Laborantin. When using Laborantin, the experimenter: * Can express experimental scenarios using a readable and familiar DSL.
This feature, albeit subjective, was confirmed by non-Haskeller
colleagues.
* Saves time on boilerplate such as writing command-line parsers or
encoding dependencies between experiments and analysis results in a
Makefile.
* Benefits from auto-documentation and result introspection features when
one
comes back to a project, possibly months or weeks later.
* Harnesses the power of Haskell type-system to catch common errors at
compile time If you had to read one story to understand the pain points that Laborantin
tries to address, it should be Section 5 of "Strategies for Sound Internet
Measurement" (V. Paxson, IMC 2004). I'd be glad to take question and comments (or, even better, code reviews
and
pull requests). Kind regards,
--Lucas DiCioccio (@lucasdicioccio on GitHub/Twitter) _______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Hello Lucas,
Am I correct to say that laborantin only does full factorial
experiments? Perhaps there is a straightforward way for users to
specify which model parameters should be confounded in a fractional
factorial design. Another extension would be to move towards
sequential designs, where the trials to run depend on the results so
far. Then more time is spent on the "interesting" regions of the
parameter space.
I think getVar/param could be re-worked to give errors at compile
time. Now you get a runtime error if you typo a parameter or get the
type wrong. Another mistake is to include parameters in the experiment
that do not have any effect on the `run` action, unless those
parameters are there for doing replicates.
Those might be addressed by doing something like:
a <- parameter "destination" $ do ...
run $ print =<< param a
Where the types are something like:
param :: Data.Tagged.Tagged a Text -> M a
values :: [T a] -> M (Tagged a Text)
str :: Text -> T Text
num :: Double -> T Double
with M being whatever state monad you currently use, and param does
the same thing it always has, except now it knows which type you put
in the values list, and it cannot be called with any string. The third
requirement might be met by requiring -fwarn-unused-matches.
An alternative strategy is to change your type Step, into an algebraic
data type with a function to convert it into what it is currently.
Before the experiment happens, you can have a function go through that
data to make sure it will succeed with it's getVar/param. This is
called a deep embedding:
http://www.haskell.org/haskellwiki/Embedded_domain_specific_language.
Regards,
Adam
On Mon, Dec 23, 2013 at 4:27 AM, lucas di cioccio
Dear all,
I am happy to announce Laborantin. Laborantin is a Haskell library and DSL for running and analyzing controlled experiments.
Repository: https://github.com/lucasdicioccio/laborantin-hs Hackage page: http://hackage.haskell.org/package/laborantin-hs
Laborantin's opinion is that running proper experiments is a non-trivial and often overlooked problem. Therefore, we should provide good tools to assist experimenters. The hope is that, with Laborantin, experimenters will spend more time on their core problem while racing through the menial tasks of editing scripts because one data point is missing in a plot. At the same time, Laborantin is also an effort within the broad open-science movement. Indeed, Laborantin's DSL separates boilerplate from the actual experiment implementation. Thus, Laborantin could reduce the friction for code and data-reuse.
One family of experiments that fit well Laborantin are benchmarks with tedious setup and teardown procedures (for instance starting, configuring, and stopping remote machines). Analyses that require measurements from a variety of data points in a multi-dimensional parameter space also fall in the scope of Laborantin.
When using Laborantin, the experimenter:
* Can express experimental scenarios using a readable and familiar DSL. This feature, albeit subjective, was confirmed by non-Haskeller colleagues. * Saves time on boilerplate such as writing command-line parsers or encoding dependencies between experiments and analysis results in a Makefile. * Benefits from auto-documentation and result introspection features when one comes back to a project, possibly months or weeks later. * Harnesses the power of Haskell type-system to catch common errors at compile time
If you had to read one story to understand the pain points that Laborantin tries to address, it should be Section 5 of "Strategies for Sound Internet Measurement" (V. Paxson, IMC 2004).
I'd be glad to take question and comments (or, even better, code reviews and pull requests).
Kind regards, --Lucas DiCioccio (@lucasdicioccio on GitHub/Twitter)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi Adam, thanks for your inputs.
2013/12/31 adam vogt
Hello Lucas,
Am I correct to say that laborantin only does full factorial experiments? Perhaps there is a straightforward way for users to specify which model parameters should be confounded in a fractional factorial design. Another extension would be to move towards sequential designs, where the trials to run depend on the results so far. Then more time is spent on the "interesting" regions of the parameter space.
Actually, the parameters specified in the DSL are "indicative" values for a full-factorial default. At this point, a command-line handler is responsible for exploring the parameter space and executing scenarios. This command-line handler has a way to specify fractional factorial designs by evaluating a query like: "(@sc.param 'foo' > @sc.param 'bar') and @sc.param 'baz' in [1,2,3,'toto']" . This small query language was my first attempt at "expression parsing and evaluation" and the code might be ugly, but it works and fits most of my current needs. Bonus: with this design, the algorithm to "explore" the satisfiable parameter space is easy to express. One direction to enrich this small query language would be to express that a parameter takes a continuous value in a range or should fullfill a boolean test function. Then we could use techniques such as rapidly exploring random trees to explore "exotic feasability regions". Another direction to improve the query language is to require ScenarioDescriptions to have a sort of "cost/fitness function" so that we can later build a parameter-space explorer that performs an optimization. We could even extend the query language to bind a parameter to a value which optimize another experiment.
I think getVar/param could be re-worked to give errors at compile time. Now you get a runtime error if you typo a parameter or get the type wrong. Another mistake is to include parameters in the experiment that do not have any effect on the `run` action, unless those parameters are there for doing replicates.
Those might be addressed by doing something like:
a <- parameter "destination" $ do ... run $ print =<< param a
Where the types are something like:
param :: Data.Tagged.Tagged a Text -> M a values :: [T a] -> M (Tagged a Text) str :: Text -> T Text num :: Double -> T Double
with M being whatever state monad you currently use, and param does the same thing it always has, except now it knows which type you put in the values list, and it cannot be called with any string. The third requirement might be met by requiring -fwarn-unused-matches.
That's one thing I am parted about. From my experience, it is sometimes handy to branch on whether a value is a number or a string (e.g., to say things like 1, 2, 3, or "all"). Somehow, tagged values do not prevent this either. Similarly, I don't know whether I should let users specify any type for their ParameterDescription at the cost of writing serializers/deserializers boilerplate (although we could provide some default useful types as it is the case now). An alternative strategy is to change your type Step, into an algebraic
data type with a function to convert it into what it is currently. Before the experiment happens, you can have a function go through that data to make sure it will succeed with it's getVar/param. This is called a deep embedding: http://www.haskell.org/haskellwiki/Embedded_domain_specific_language.
That can be an idea, I didn't go that far yet, but I'll keep an eye on it. Best wishes for this happy new year, --Lucas Regards,
Adam
On Mon, Dec 23, 2013 at 4:27 AM, lucas di cioccio
wrote: Dear all,
I am happy to announce Laborantin. Laborantin is a Haskell library and DSL for running and analyzing controlled experiments.
Repository: https://github.com/lucasdicioccio/laborantin-hs Hackage page: http://hackage.haskell.org/package/laborantin-hs
Laborantin's opinion is that running proper experiments is a non-trivial and often overlooked problem. Therefore, we should provide good tools to assist experimenters. The hope is that, with Laborantin, experimenters will spend more time on their core problem while racing through the menial tasks of editing scripts because one data point is missing in a plot. At the same time, Laborantin is also an effort within the broad open-science movement. Indeed, Laborantin's DSL separates boilerplate from the actual experiment implementation. Thus, Laborantin could reduce the friction for code and data-reuse.
One family of experiments that fit well Laborantin are benchmarks with tedious setup and teardown procedures (for instance starting, configuring, and stopping remote machines). Analyses that require measurements from a variety of data points in a multi-dimensional parameter space also fall in the scope of Laborantin.
When using Laborantin, the experimenter:
* Can express experimental scenarios using a readable and familiar DSL. This feature, albeit subjective, was confirmed by non-Haskeller colleagues. * Saves time on boilerplate such as writing command-line parsers or encoding dependencies between experiments and analysis results in a Makefile. * Benefits from auto-documentation and result introspection features when one comes back to a project, possibly months or weeks later. * Harnesses the power of Haskell type-system to catch common errors at compile time
If you had to read one story to understand the pain points that Laborantin tries to address, it should be Section 5 of "Strategies for Sound Internet Measurement" (V. Paxson, IMC 2004).
I'd be glad to take question and comments (or, even better, code reviews and pull requests).
Kind regards, --Lucas DiCioccio (@lucasdicioccio on GitHub/Twitter)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (4)
-
adam vogt
-
Corey O'Connor
-
lucas di cioccio
-
Tom Nielsen