
Dear friends, we have a distributed system written in Haskell, consisting of three types of nodes with dozen of instances of each of two types and a central node of the third type. Each node is started by executing a binary which sets up acid-state persistence layer and sockets over which msgpack messages are sent around. It is trivial to write low level functionality quickcheck test suites, which test properties of functions. We would like, however, to have a quickcheck-esque suite which sets up the network, then gets it to an arbitrary valid state (sending correct messages between nodes), then rigorously testing it for three high-level properties: 1. Chaos monkey test (disable random node, see if certain invariants hold); 2. Evil node test (make several nodes work against the system, see if certain properties hold); 3. Rigorous testing of network-wide invariants, if all the nodes operate correctly. The problem we're facing is the following — if we want to inspect state of nodes in Haskell-land, we have to write a huge machinery which emulates every launch of node via thread. There will be bugs in this machinery, so we won't be able to get quality testing information before we fix those; if we want to run things as processes, then the best thing we can do is to inspect either acid-state dbs of each node (it poses resource locking problems and forces us to dump the state on every change, which is undesirable), or make an observer node, which dumps the consensus as Text and then parsing the data into Haskell terms, making decisions about the required properties based on that (so far, it feels like the best option). Am I missing something? How is something like this achieved in culture? How would you approach such a problem? Links to source files of test suites which do something similar are highly appreciated.