
Dominic,
What I can't see at the moment is how to keep what I was doing modular. I had a module Anonymize, the implementation of which I wanted to change without the user of it having to change their code. The initial implementation was a state monad which generated a new string every time it needed one but if it was a string it had already anonymized then it looked it up in the state. I initially used a list but with 100,000+ strings it took a long time. The next implementation used FiniteMap which improved things considerably. I only had to make three changes in Anonymize and none in Main. Using MD5 is quicker still but isn't so good from the program maintenance point of view.
my first stab at the modularity issue was the version _2 in my last message. Looking back at the Anonymizable class and instances in your full program, type Anon a = IO a class Anonymizable a where anonymize :: a -> Anon a -- MyString avoids overlapping instances of Strings -- with the [Char] data MyString = MyString String deriving Show instance Anonymizable MyString where anonymize (MyString x) = do s <- digest x return ((MyString . showHex') s) instance Anonymizable a => Anonymizable [a] where anonymize xs = mapM anonymize xs the problem is in the Anonymizable instance for [a]: the mapM in anonymize constructs an IO script, consisting of some IO operation for each list element, all chained together into a monolithic whole. As IO a is an abstract type, this is a bit too restrictive to be modular: if I ever want any of the anonymized Strings, I can only get a script that anonymizes them all - before executing that script, I don't have any anonymized Strings, and after executing the script, all of them have been processed. This forestalls any attempt to interleave the anonymization with some further per-element processing. Instead, I would prefer to have a list of IO actions, not yet chained together (after all, in Haskell, they are just data items), but that doesn't fit the current return type of anonymize. One approach would be to change the type of Anon a to [IO a], or to ignore the [a] instance and use the MyString instance only, but the longer I look at the code, the less I'm convinced that the overloading is needed at all. Unless there are other instances of Anonymizable, why not simply have a function anonymize :: String -> Anon String ? That would still allow you to hide the implementation decisions you mentioned (even in a separate module), provided that any extra state you need can be kept in the IO monad. One would have to write mapM anonymize explicitly where you had simply anonymize, but it would then become possible to do something else with the list of IO actions before executing them (in this case, to interleave the printing with the anonymization). First, here is the interesting fragment with the un-overloaded anonymize: readAndWriteAttrVals = do h <- openFile fileout WriteMode s <- readFile filename a <- mapM anonymize (lines s) hPutStr h (unlines a) It is now possible to import anonymize from elsewhere and do the interleaving in the code that uses anonymize: readAndWriteAttrVals = do h <- openFile fileout WriteMode s <- readFile filename let action line = do { a <- anonymize l ; hPutStr h a } mapM_ action (lines s) Would that work for your problem? Alternatively, if some of your implementation options require initialization or cleanup, your Anonymize module could offer a function to process all lines, with a hook for per-line processing: processLinesWith perLineAction ls = do { initialize ; as <- mapM action ls ; cleanup ; return as } where action l = do { a <- anonymize l ; perLineAction a } Then the code in the client module could simply be: readAndWriteAttrVals = do h <- openFile fileout WriteMode s <- readFile filename processLinesWith (hPutStr h) (lines s) return () Closing the loop, one could now redefine the original, overloaded anonymize to take a perLineAction, with the obvious instances for MyString and [a], but I really don't see why every function should have to be called anonymize?-) Claus PS The simplified code of the new variant, for observation: module Main(main) where import Observe import IO(openFile, hPutStr, IOMode(ReadMode,WriteMode,AppendMode)) filename = "ldif1.txt" fileout = "ldif.out" readAndWriteAttrVals = do h <- openFile fileout WriteMode s <- readFile filename let { anonymize s = return (':':s) ; action l = do { a <- anonymize l ; hPutStr h a } } mapM_ (observe "action" action) (lines s) main = runO readAndWriteAttrVals
participants (1)
-
C.Reinke