
Dear friends, we have a distributed system written in Haskell, consisting of three types of nodes with dozen of instances of each of two types and a central node of the third type. Each node is started by executing a binary which sets up acid-state persistence layer and sockets over which msgpack messages are sent around. It is trivial to write low level functionality quickcheck test suites, which test properties of functions. We would like, however, to have a quickcheck-esque suite which sets up the network, then gets it to an arbitrary valid state (sending correct messages between nodes), then rigorously testing it for three high-level properties: 1. Chaos monkey test (disable random node, see if certain invariants hold); 2. Evil node test (make several nodes work against the system, see if certain properties hold); 3. Rigorous testing of network-wide invariants, if all the nodes operate correctly. The problem we're facing is the following — if we want to inspect state of nodes in Haskell-land, we have to write a huge machinery which emulates every launch of node via thread. There will be bugs in this machinery, so we won't be able to get quality testing information before we fix those; if we want to run things as processes, then the best thing we can do is to inspect either acid-state dbs of each node (it poses resource locking problems and forces us to dump the state on every change, which is undesirable), or make an observer node, which dumps the consensus as Text and then parsing the data into Haskell terms, making decisions about the required properties based on that (so far, it feels like the best option). Am I missing something? How is something like this achieved in culture? How would you approach such a problem? Links to source files of test suites which do something similar are highly appreciated.

Hi Jonn,
I work on a similar-sounding system. We have arranged things so that each
node is a pure state machine, with outputs that are a pure function of its
inputs, with separate (and simple, obviously correct) machinery for
connecting these state machines over the network. This makes it rather
simple to run a bunch of these state machines in a test harness that
simulates all sorts of network nastiness (disconnections, dropped or
reordered messages, delays, corruption etc.) on a single thread.
One trick that proved useful was to feed in the current time as an explicit
input message. This makes it possible to test things like timeouts without
having to actually wait for the time to pass, which speeds things up
immensely. We also make use of ContT somewhere in the tests to interleave
processing and assertions, and to define a 'hypothetically' operator that
lets a test run a sequence of actions and then backtrack.
I think this idea was inspired by
https://github.com/NicolasT/paxos/blob/master/bin/synod.hs, at least the
network nastiness simulator thing. He uses threads for that demo but the
nodes' behaviour itself is pure:
https://github.com/NicolasT/paxos/blob/master/src/Network/Paxos/Synod/Propos...
for
example.
We also have proved certain key properties of the network are implied by
certain local invariants, which reduces the testing problem down to one of
checking properties on each node separately. This was time consuming, but
highlighted certain important corner cases that it's unlikely we would have
found by random testing.
If you're interested in Byzantine behaviour (the 'evil node' test) then you
may enjoy reading James Mickens' article on the subject:
https://www.usenix.org/publications/login-logout/may-2013/saddest-moment
Hope that helps,
David
PS a double apology: firstly for the double message (my first attempt was
sent from the wrong address) and secondly for spelling your name wrong in
that message!
On 31 March 2016 at 00:41, Jonn Mostovoy
Dear friends,
we have a distributed system written in Haskell, consisting of three types of nodes with dozen of instances of each of two types and a central node of the third type.
Each node is started by executing a binary which sets up acid-state persistence layer and sockets over which msgpack messages are sent around.
It is trivial to write low level functionality quickcheck test suites, which test properties of functions.
We would like, however, to have a quickcheck-esque suite which sets up the network, then gets it to an arbitrary valid state (sending correct messages between nodes), then rigorously testing it for three high-level properties:
1. Chaos monkey test (disable random node, see if certain invariants hold); 2. Evil node test (make several nodes work against the system, see if certain properties hold); 3. Rigorous testing of network-wide invariants, if all the nodes operate correctly.
The problem we're facing is the following — if we want to inspect state of nodes in Haskell-land, we have to write a huge machinery which emulates every launch of node via thread. There will be bugs in this machinery, so we won't be able to get quality testing information before we fix those; if we want to run things as processes, then the best thing we can do is to inspect either acid-state dbs of each node (it poses resource locking problems and forces us to dump the state on every change, which is undesirable), or make an observer node, which dumps the consensus as Text and then parsing the data into Haskell terms, making decisions about the required properties based on that (so far, it feels like the best option).
Am I missing something? How is something like this achieved in culture? How would you approach such a problem?
Links to source files of test suites which do something similar are highly appreciated. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Dear David,
thank you very much! Your answers were extremely insightful.
To adapt this approach we will need to refactor our types quite a bit, but
that will be well worth it.
—
Kindest regards,
jm
On Mar 31, 2016 10:10 AM, "David Turner"
Hi Jonn,
I work on a similar-sounding system. We have arranged things so that each node is a pure state machine, with outputs that are a pure function of its inputs, with separate (and simple, obviously correct) machinery for connecting these state machines over the network. This makes it rather simple to run a bunch of these state machines in a test harness that simulates all sorts of network nastiness (disconnections, dropped or reordered messages, delays, corruption etc.) on a single thread.
One trick that proved useful was to feed in the current time as an explicit input message. This makes it possible to test things like timeouts without having to actually wait for the time to pass, which speeds things up immensely. We also make use of ContT somewhere in the tests to interleave processing and assertions, and to define a 'hypothetically' operator that lets a test run a sequence of actions and then backtrack.
I think this idea was inspired by https://github.com/NicolasT/paxos/blob/master/bin/synod.hs, at least the network nastiness simulator thing. He uses threads for that demo but the nodes' behaviour itself is pure: https://github.com/NicolasT/paxos/blob/master/src/Network/Paxos/Synod/Propos... for example.
We also have proved certain key properties of the network are implied by certain local invariants, which reduces the testing problem down to one of checking properties on each node separately. This was time consuming, but highlighted certain important corner cases that it's unlikely we would have found by random testing.
If you're interested in Byzantine behaviour (the 'evil node' test) then you may enjoy reading James Mickens' article on the subject: https://www.usenix.org/publications/login-logout/may-2013/saddest-moment
Hope that helps,
David
PS a double apology: firstly for the double message (my first attempt was sent from the wrong address) and secondly for spelling your name wrong in that message!
On 31 March 2016 at 00:41, Jonn Mostovoy
wrote: Dear friends,
we have a distributed system written in Haskell, consisting of three types of nodes with dozen of instances of each of two types and a central node of the third type.
Each node is started by executing a binary which sets up acid-state persistence layer and sockets over which msgpack messages are sent around.
It is trivial to write low level functionality quickcheck test suites, which test properties of functions.
We would like, however, to have a quickcheck-esque suite which sets up the network, then gets it to an arbitrary valid state (sending correct messages between nodes), then rigorously testing it for three high-level properties:
1. Chaos monkey test (disable random node, see if certain invariants hold); 2. Evil node test (make several nodes work against the system, see if certain properties hold); 3. Rigorous testing of network-wide invariants, if all the nodes operate correctly.
The problem we're facing is the following — if we want to inspect state of nodes in Haskell-land, we have to write a huge machinery which emulates every launch of node via thread. There will be bugs in this machinery, so we won't be able to get quality testing information before we fix those; if we want to run things as processes, then the best thing we can do is to inspect either acid-state dbs of each node (it poses resource locking problems and forces us to dump the state on every change, which is undesirable), or make an observer node, which dumps the consensus as Text and then parsing the data into Haskell terms, making decisions about the required properties based on that (so far, it feels like the best option).
Am I missing something? How is something like this achieved in culture? How would you approach such a problem?
Links to source files of test suites which do something similar are highly appreciated. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Dear David,
we were inspired by your suggestions and our employee, Konstantin
Ivanov, implemented a very elegant emulator, which we will eventually
carry out into a separate library.
While presenting the project at Blockchain Summer School, we have
credited you for pointing us in the right direction. Konstantin, of
course, was credited for implementation of the solution. In case you
are interested, video of the talk will be available in late June /
early July.
Entire source tree is public and hosted on Github:
https://github.com/input-output-hk/rscoin-haskell
Thank you once again for inspiration!
On Apr 1, 2016 6:57 PM, "Jonn Mostovoy"
Dear David,
thank you very much! Your answers were extremely insightful.
To adapt this approach we will need to refactor our types quite a bit, but that will be well worth it.
— Kindest regards, jm
On Mar 31, 2016 10:10 AM, "David Turner"
wrote: Hi Jonn,
I work on a similar-sounding system. We have arranged things so that each node is a pure state machine, with outputs that are a pure function of its inputs, with separate (and simple, obviously correct) machinery for connecting these state machines over the network. This makes it rather simple to run a bunch of these state machines in a test harness that simulates all sorts of network nastiness (disconnections, dropped or reordered messages, delays, corruption etc.) on a single thread.
One trick that proved useful was to feed in the current time as an explicit input message. This makes it possible to test things like timeouts without having to actually wait for the time to pass, which speeds things up immensely. We also make use of ContT somewhere in the tests to interleave processing and assertions, and to define a 'hypothetically' operator that lets a test run a sequence of actions and then backtrack.
I think this idea was inspired by https://github.com/NicolasT/paxos/blob/master/bin/synod.hs, at least the network nastiness simulator thing. He uses threads for that demo but the nodes' behaviour itself is pure: https://github.com/NicolasT/paxos/blob/master/src/Network/Paxos/Synod/Propos... for example.
We also have proved certain key properties of the network are implied by certain local invariants, which reduces the testing problem down to one of checking properties on each node separately. This was time consuming, but highlighted certain important corner cases that it's unlikely we would have found by random testing.
If you're interested in Byzantine behaviour (the 'evil node' test) then you may enjoy reading James Mickens' article on the subject: https://www.usenix.org/publications/login-logout/may-2013/saddest-moment
Hope that helps,
David
PS a double apology: firstly for the double message (my first attempt was sent from the wrong address) and secondly for spelling your name wrong in that message!
On 31 March 2016 at 00:41, Jonn Mostovoy
wrote: Dear friends,
we have a distributed system written in Haskell, consisting of three types of nodes with dozen of instances of each of two types and a central node of the third type.
Each node is started by executing a binary which sets up acid-state persistence layer and sockets over which msgpack messages are sent around.
It is trivial to write low level functionality quickcheck test suites, which test properties of functions.
We would like, however, to have a quickcheck-esque suite which sets up the network, then gets it to an arbitrary valid state (sending correct messages between nodes), then rigorously testing it for three high-level properties:
1. Chaos monkey test (disable random node, see if certain invariants hold); 2. Evil node test (make several nodes work against the system, see if certain properties hold); 3. Rigorous testing of network-wide invariants, if all the nodes operate correctly.
The problem we're facing is the following — if we want to inspect state of nodes in Haskell-land, we have to write a huge machinery which emulates every launch of node via thread. There will be bugs in this machinery, so we won't be able to get quality testing information before we fix those; if we want to run things as processes, then the best thing we can do is to inspect either acid-state dbs of each node (it poses resource locking problems and forces us to dump the state on every change, which is undesirable), or make an observer node, which dumps the consensus as Text and then parsing the data into Haskell terms, making decisions about the required properties based on that (so far, it feels like the best option).
Am I missing something? How is something like this achieved in culture? How would you approach such a problem?
Links to source files of test suites which do something similar are highly appreciated. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

That's very kind of you, thanks. Not sure I deserve it, but glad I could
help. I'll look forward to the video.
One of my colleagues stumbled across this page which also suggests
injecting the current time as a periodic Tick message rather than
sprinkling timeouts throughout your code:
http://yager.io/Distributed/Distributed.html
Cheers,
David
On 15 Jun 2016 23:18, "Jonn Mostovoy"
Dear David,
we were inspired by your suggestions and our employee, Konstantin Ivanov, implemented a very elegant emulator, which we will eventually carry out into a separate library.
While presenting the project at Blockchain Summer School, we have credited you for pointing us in the right direction. Konstantin, of course, was credited for implementation of the solution. In case you are interested, video of the talk will be available in late June / early July.
Entire source tree is public and hosted on Github: https://github.com/input-output-hk/rscoin-haskell
Thank you once again for inspiration!
On Apr 1, 2016 6:57 PM, "Jonn Mostovoy"
wrote: Dear David,
thank you very much! Your answers were extremely insightful.
To adapt this approach we will need to refactor our types quite a bit,
but that will be well worth it.
— Kindest regards, jm
On Mar 31, 2016 10:10 AM, "David Turner"
wrote:
Hi Jonn,
I work on a similar-sounding system. We have arranged things so that
each node is a pure state machine, with outputs that are a pure function of its inputs, with separate (and simple, obviously correct) machinery for connecting these state machines over the network. This makes it rather simple to run a bunch of these state machines in a test harness that simulates all sorts of network nastiness (disconnections, dropped or reordered messages, delays, corruption etc.) on a single thread.
One trick that proved useful was to feed in the current time as an
explicit input message. This makes it possible to test things like timeouts without having to actually wait for the time to pass, which speeds things up immensely. We also make use of ContT somewhere in the tests to interleave processing and assertions, and to define a 'hypothetically' operator that lets a test run a sequence of actions and then backtrack.
I think this idea was inspired by
https://github.com/NicolasT/paxos/blob/master/bin/synod.hs, at least the network nastiness simulator thing. He uses threads for that demo but the nodes' behaviour itself is pure: https://github.com/NicolasT/paxos/blob/master/src/Network/Paxos/Synod/Propos... for example.
We also have proved certain key properties of the network are implied
by certain local invariants, which reduces the testing problem down to one of checking properties on each node separately. This was time consuming, but highlighted certain important corner cases that it's unlikely we would have found by random testing.
If you're interested in Byzantine behaviour (the 'evil node' test) then
you may enjoy reading James Mickens' article on the subject: https://www.usenix.org/publications/login-logout/may-2013/saddest-moment
Hope that helps,
David
PS a double apology: firstly for the double message (my first attempt
was sent from the wrong address) and secondly for spelling your name wrong in that message!
On 31 March 2016 at 00:41, Jonn Mostovoy
wrote: Dear friends,
we have a distributed system written in Haskell, consisting of three types of nodes with dozen of instances of each of two types and a central node of the third type.
Each node is started by executing a binary which sets up acid-state persistence layer and sockets over which msgpack messages are sent around.
It is trivial to write low level functionality quickcheck test suites, which test properties of functions.
We would like, however, to have a quickcheck-esque suite which sets up the network, then gets it to an arbitrary valid state (sending correct messages between nodes), then rigorously testing it for three high-level properties:
1. Chaos monkey test (disable random node, see if certain invariants
hold);
2. Evil node test (make several nodes work against the system, see if certain properties hold); 3. Rigorous testing of network-wide invariants, if all the nodes operate correctly.
The problem we're facing is the following — if we want to inspect state of nodes in Haskell-land, we have to write a huge machinery which emulates every launch of node via thread. There will be bugs in this machinery, so we won't be able to get quality testing information before we fix those; if we want to run things as processes, then the best thing we can do is to inspect either acid-state dbs of each node (it poses resource locking problems and forces us to dump the state on every change, which is undesirable), or make an observer node, which dumps the consensus as Text and then parsing the data into Haskell terms, making decisions about the required properties based on that (so far, it feels like the best option).
Am I missing something? How is something like this achieved in culture? How would you approach such a problem?
Links to source files of test suites which do something similar are highly appreciated. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

And we've come full circle! David, my vague recollection of your previous email inspired me to try some of the techniques I espoused in that article. So a thanks from me as well! Will
On Jun 16, 2016, at 00:21, David Turner
wrote: That's very kind of you, thanks. Not sure I deserve it, but glad I could help. I'll look forward to the video.
One of my colleagues stumbled across this page which also suggests injecting the current time as a periodic Tick message rather than sprinkling timeouts throughout your code:
http://yager.io/Distributed/Distributed.html
Cheers,
David
On 15 Jun 2016 23:18, "Jonn Mostovoy"
wrote: Dear David, we were inspired by your suggestions and our employee, Konstantin Ivanov, implemented a very elegant emulator, which we will eventually carry out into a separate library.
While presenting the project at Blockchain Summer School, we have credited you for pointing us in the right direction. Konstantin, of course, was credited for implementation of the solution. In case you are interested, video of the talk will be available in late June / early July.
Entire source tree is public and hosted on Github: https://github.com/input-output-hk/rscoin-haskell
Thank you once again for inspiration!
On Apr 1, 2016 6:57 PM, "Jonn Mostovoy"
wrote: Dear David,
thank you very much! Your answers were extremely insightful.
To adapt this approach we will need to refactor our types quite a bit, but that will be well worth it.
— Kindest regards, jm
On Mar 31, 2016 10:10 AM, "David Turner"
wrote: Hi Jonn,
I work on a similar-sounding system. We have arranged things so that each node is a pure state machine, with outputs that are a pure function of its inputs, with separate (and simple, obviously correct) machinery for connecting these state machines over the network. This makes it rather simple to run a bunch of these state machines in a test harness that simulates all sorts of network nastiness (disconnections, dropped or reordered messages, delays, corruption etc.) on a single thread.
One trick that proved useful was to feed in the current time as an explicit input message. This makes it possible to test things like timeouts without having to actually wait for the time to pass, which speeds things up immensely. We also make use of ContT somewhere in the tests to interleave processing and assertions, and to define a 'hypothetically' operator that lets a test run a sequence of actions and then backtrack.
I think this idea was inspired by https://github.com/NicolasT/paxos/blob/master/bin/synod.hs, at least the network nastiness simulator thing. He uses threads for that demo but the nodes' behaviour itself is pure: https://github.com/NicolasT/paxos/blob/master/src/Network/Paxos/Synod/Propos... for example.
We also have proved certain key properties of the network are implied by certain local invariants, which reduces the testing problem down to one of checking properties on each node separately. This was time consuming, but highlighted certain important corner cases that it's unlikely we would have found by random testing.
If you're interested in Byzantine behaviour (the 'evil node' test) then you may enjoy reading James Mickens' article on the subject: https://www.usenix.org/publications/login-logout/may-2013/saddest-moment
Hope that helps,
David
PS a double apology: firstly for the double message (my first attempt was sent from the wrong address) and secondly for spelling your name wrong in that message!
On 31 March 2016 at 00:41, Jonn Mostovoy
wrote: Dear friends,
we have a distributed system written in Haskell, consisting of three types of nodes with dozen of instances of each of two types and a central node of the third type.
Each node is started by executing a binary which sets up acid-state persistence layer and sockets over which msgpack messages are sent around.
It is trivial to write low level functionality quickcheck test suites, which test properties of functions.
We would like, however, to have a quickcheck-esque suite which sets up the network, then gets it to an arbitrary valid state (sending correct messages between nodes), then rigorously testing it for three high-level properties:
1. Chaos monkey test (disable random node, see if certain invariants hold); 2. Evil node test (make several nodes work against the system, see if certain properties hold); 3. Rigorous testing of network-wide invariants, if all the nodes operate correctly.
The problem we're facing is the following — if we want to inspect state of nodes in Haskell-land, we have to write a huge machinery which emulates every launch of node via thread. There will be bugs in this machinery, so we won't be able to get quality testing information before we fix those; if we want to run things as processes, then the best thing we can do is to inspect either acid-state dbs of each node (it poses resource locking problems and forces us to dump the state on every change, which is undesirable), or make an observer node, which dumps the consensus as Text and then parsing the data into Haskell terms, making decisions about the required properties based on that (so far, it feels like the best option).
Am I missing something? How is something like this achieved in culture? How would you approach such a problem?
Links to source files of test suites which do something similar are highly appreciated. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
participants (3)
-
David Turner
-
Jonn Mostovoy
-
Will Yager