Refactoring type-class madness

Hi there haskellers, I have thoroughly confused myself with type-classes in a haskell system I am writing, and I was wondering if anyone had some useful suggestions to get me out of my mess. I apologise if this is all long and rambling, but that maybe why I can't solve it... The system itself performs analyses on data from human eye-tracking experiments, where the tracking data is simplified from raw positional data into "events" which are either fixations or saccades. This is easy to do: data Event = Fixation {...} | Saccade {...} Because, at the basic level all of the experiments share this type of data, it seems that I should be able to write analysis functions that work for any experiment. However, the experiments differ in the stimuli used, and associated with each stimulus set is a set of "milestones" that give times at which important things happen in the stimuli, and "regions of interest" that give areas of the visual scene that are considered important. For a single experiment I would have: data Experiment = Exp [Trial] data Trial = Trial [Event] (Map MileStone Time) data MileStone = M1 | M2 | ... and the analysis functions can take an experiment, use the events and the milestones, and return whatever. The problem arises when I want to represent a second experiment. I assumed the way to solve this was with a type class: class ExperimentClass a data ExperimentOne = Exp [Trial] instance ExperimentClass ExperimentOne and have the analysis functions take (ExperimentClass a) => a -> ... This, however, necessitates a change to Trial as well, because the MileStones for each experiment are different: class MileStoneClass a data (MileStoneClass a) => Trial a = Trial [Event] (Map a Time) which means that: data (MileStoneClass a) => ExperimentOne a = Exp [Trial a] and maybe even: class (MileStoneClass b) => ExperimentClass a b instance ExperimentClass ExperimentOne ExOneMileStones which all seems fine. But then I need to add regions of interest, and a corresponding type parameter, and a couple of more things with type parameters, and I get something like: class (MileStoneClass b, RegionClass c, ... d, ... e) => ExperimentClass a b c d e which is a lot of type parameters that are all dependent on a, since the experiment defines the milestones etc. I can use fundeps to enforce that in the type system, but it is still quite messy. So, I was wondering whether there was something wrong with my basic model which leads to this ugly type class, or whether this is the proper way forward. Either is fine, really, it would just be nice to know for certain. Thanks in advance, Andrew

Andrew Webb wrote:
Because, at the basic level all of the experiments share this type of data, it seems that I should be able to write analysis functions that work for any experiment. However, the experiments differ in the stimuli used, and associated with each stimulus set is a set of "milestones" that give times at which important things happen in the stimuli, and "regions of interest" that give areas of the visual scene that are considered important. For a single experiment I would have:
data Experiment = Exp [Trial] data Trial = Trial [Event] (Map MileStone Time) data MileStone = M1 | M2 | ...
[...]
So, I was wondering whether there was something wrong with my basic model which leads to this ugly type class, or whether this is the proper way forward. Either is fine, really, it would just be nice to know for certain.
In sounds like you're trying to use typeclasses as if they were OO-classes, which is a good way to confuse yourself. What you probably want is just parametric polymorphism. For example, if every experiment is a sequence of trials, and every trial is a sequence of events with some milestones, then you can get the generality you want with: data Experiment m = Exp [Trial m] data Trial m = Trial [Event] (Map m Time) data Event = Fixation {...} | Saccade {...} data A = MS_A1 | MS_A2 | ... deriving (Ord, Enum) data B = MS_B1 | MS_B2 | ... deriving (Ord, Enum) ... then you would pass around (Experiment A), (Experiment B), etc. The reason for the Ord instances is so you can use them as keys in Map, and the reason for Enum is just so you have a generic interface for listing all the milestones (though getting the keys of the map may suffice). The main reason for wanting to use typeclasses is when you have a common interface (i.e. set of function names and types), but the implementations of that interface are structurally/algorithmically different. If the structure of the implementation is the same and only the type of some component changes, then parametric polymorphism is the way to go. Off-topic to your original question, it seems like a better model for your data might be to treat milestones as a third kind of event. Thus, a trial is just a sequence of events, which could be subject events (fixation, saccades) or experimental events (milestones, etc). This is assuming that your processing only cares about how patient events occur relative to experimental events, and that you don't need access to experimental events separately. If you need to be able to jump around to different events, then you could use Trial[Event](Map m [Event]) and construct the map after reading input by walking over the list of events and storing pointers to the subsequence beginning with each milestone: computeMilestones :: Trial m -> Trial m computeMilestones (Trial es m) = Trial es (go es m) where go [] m = m go es@(e:es') m = go es' (m' e) where m' (MS x) = insert x es m m' _ = m -- Live well, ~wren

Hey hey, Thanks for the response!
In sounds like you're trying to use typeclasses as if they were OO-classes, which is a good way to confuse yourself. What you probably want is just parametric polymorphism. For example, if every experiment is a sequence of trials, and every trial is a sequence of events with some milestones, then you can get the generality you want with:
Indeed, you seem to be correct. I was implementing concurrently in haskell and in scala, and there seems to have been some leakage between the two...
The main reason for wanting to use typeclasses is when you have a common interface (i.e. set of function names and types), but the implementations of that interface are structurally/algorithmically different. If the structure of the implementation is the same and only the type of some component changes, then parametric polymorphism is the way to go.
Yes, I quite agree. I think I got into a "when all you have is a hammer..." mindset. Thanks for the clarification.
Off-topic to your original question, it seems like a better model for your data might be to treat milestones as a third kind of event. Thus, a trial is just a sequence of events, which could be subject events (fixation, saccades) or experimental events (milestones, etc). This is assuming that your processing only cares about how patient events occur relative to experimental events, and that you don't need access to experimental events separately.
At first glance that does seem sensible, and was, in fact, my first design; however, it doesn't pan out that way. The problem arises from the fact that the two type of events have durations (so a start time, and an end time) whereas the milestones are instantaneous events, and which can occur with the span of an event. In different situations we care about different relationships between events and milestones; so, at times I care about whether an event is strictly before (i.e., starting and ending before) the milestone, but at other times I only care about whether it starts before the milestone, and the situation is the same for ending after, or occurring strictly after. Hence the current design. Thank you for taking the time to suggest it though! Ta, Andrew
participants (3)
-
Andrew
-
Andrew Webb
-
wren ng thornton