Debugging partial functions by the rules

So all this talk of locating head [] and fromJust failures got me thinking: Couldn't we just use rewrite rules to rewrite *transparently* all uses of fromJust to safeFromJust, tagging the call site with a location? To work this requires a few things to go right: * a rewrite rule * assertions * and rewrite rules firing before assertions are expanded Let's try this. Consider the program: 1 import qualified Data.Map as M 2 import Data.Maybe 3 4 main = print f 5 6 f = let m = M.fromList 7 [(1,"1") 8 ,(2,"2") 9 ,(3,"3")] 10 s = M.lookup 4 m 11 in fromJust s When we run it we get the not so useful error: $ ./A A: Maybe.fromJust: Nothing Ok, so we have a few tricks for locating this, using LocH (http://www.cse.unsw.edu.au/~dons/loch.html), we can catch an assertion failure, but we have to insert the assertion by hand: 1 import Debug.Trace.Location 2 import qualified Data.Map as M 3 import Data.Maybe 4 5 main = do print f 6 7 f = let m = M.fromList 8 [(1,"1") 9 ,(2,"2") 10 ,(3,"3")] 11 s = M.lookup 4 m 12 in safeFromJust assert s 13 14 safeFromJust a = check a . fromJust Which correctly identifies the call site: $ ./A A: A.hs:12:20-25: Maybe.fromJust: Nothing Now, this approach is a little fragile. 'assert' is only respected by GHC if -O is *not* on, so if we happened to try this trick with -O, we'd get: $ ./A A: Debug.Trace.Location.failure So lesson one: you have to do the bug hunting with -Onot. Currently there's -fignore-asserts for turning off assertions, but no flag for turning them on with -O, Simon, could this be fixed? Could we get a -frespect-asserts that works even with -O ? Ok, assuming this assert trick is used, can we get the compiler to insert the asserts for us? If so, this would be a great advantage, you'd just be able to switch on a flag, or import a debugging module, and your fromJusts would be transparently rewritten. With rewrite rules we do just this! So, to our initial unsafe use of fromJust, we add a rewrite rule: -- -- rewrite fromJust to a located version, and hope that GHC expands -- 'assert' after the rule fires.. -- {-# RULES "located fromJust" fromJust = check assert . myFromJust #-} This just tells the compiler to replace every occurence of fromJust with a assertion-throwing fromJust, should it fail. We have to use myFromJust here, to avoid rule recursion. -- -- Inlined to avoid recursion in the rule: -- myFromJust :: Maybe a -> a myFromJust Nothing = error "Maybe.fromJust: Nothing" -- yuck myFromJust (Just x) = x Ok, so can we get ghc to rewrite fromJust to the safe fromJust magicaly? $ ghc --make -Onot A.hs -fglasgow-exts -ddump-simpl-stats [1 of 1] Compiling Main ( A.hs, A.o ) 1 RuleFired 1 located fromJust Linking A ... Yes, the rule fired! GHC *did* rewrite our fromJust to a more useful fromJust. Running it: $ ./A A: A.hs:19:36-41: Maybe.fromJust: Nothing Looks good! But that is deceiving: the assert was expanded before the rule fired, and refers to the rewrite rule source line (line 19), not the fromJust call site (line 12). Now if we could just have the 'assert' token inserted into the AST before it was expanded, we'd be home and dry. Could this be done with TH? Or could we arrange for asserts in rewrite rules not to be expanded till later? Note that this is still a useful technique, we can rewrite head/fromJust/... to some other possibly more useful message. And if we can constrain the rule to fire in only particular modules, we may be able to narrow down the bug, just by turning on a rule. For example, adding: {-# RULES "located fromJust" fromJust = safeFromJust #-} safeFromJust s = case s of Nothing -> "safeFromJust: failed with Nothing. Ouch" Just x -> x will produce: $ ./A "safeFromJust: failed with Nothing. Ouch" So rewrite rules can be used to transparently alter uses of partial functions like head and fromJust. So, further work: * have 'assert' respected when -O is on * think up a technique for splicing in 'assert' via rewrite rules (or TH ...) such that the src locations are expanded after the rewrite, and correctly reflect the location of the splice point. Any ideas? -- Don

Couldn't we just use rewrite rules to rewrite *transparently* all uses of fromJust to safeFromJust, tagging the call site with a location? .. Looks good! But that is deceiving: the assert was expanded before the rule fired, and refers to the rewrite rule source line (line 19), not the fromJust call site (line 12). Now if we could just have the 'assert' token inserted into the AST before it was expanded, we'd be home and dry. Could this be done with TH? Or could we arrange for asserts in rewrite rules not to be expanded till later? .. Any ideas?
http://www.haskell.org/pipermail/glasgow-haskell-users/2006-November/011545.... claus

| Looks good! But that is deceiving: the assert was expanded before the rule | fired, and refers to the rewrite rule source line (line 19), not the fromJust | call site (line 12). Now if we could just have the 'assert' token inserted | into the AST before it was expanded, we'd be home and dry. Could this be done | with TH? Or could we arrange for asserts in rewrite rules not to be expanded | till later? That's difficult. Trouble is, the assert expansion happens right at the front, before any desugaring or program transformation. But rewrite rules fire much, much later, in the simplifier. It is, however, possible (or could be made possible) to answer the question "When this rule fires, what module am I compiling", and make that available in the RHS of the rule, by some magic incantation. Harder would be "when this rule fires, what top-level function's RHS is being simplified?". But even that is tricky, because it might be "lvl_4532", a function created by the simplifier itself. The only solid way to connect to programmer-comprehensible stuff is by transforming the original, unadulterated source code, as Hat does, and as Jhc does, and as Simon and I are musing about doing. Your idea is to do something cheap and cheerful, which is always a good plan. But I don't see how to make it fly. (The -O thing, and/or providing $currentLocation rather than just 'assert', seem easier.) Simon

[deleted cc to haskell-cafe; RULES and discussion details are GHC-specific]
That's difficult. Trouble is, the assert expansion happens right at the front, before any desugaring or program transformation. But rewrite rules fire much, much later, in the simplifier.
and there doesn't seem to be any source location information left at that stage (no longer associated with the AST, but woven into error messages and the like) - what a pity. there may be other uses for having such information at hand in core expressions, but adding it just for the present purpose would be overkill (unless one could sneak it just into identifiers, perhaps?). but if one cannot define assert-like transformation using simplifier rules, one might still generalise assert-style transformations (there already are several of them in RnExpr.lhs; for assert and breakpoints).
The only solid way to connect to programmer-comprehensible stuff is by transforming the original, unadulterated source code, as Hat does, and as Jhc does, and as Simon and I are musing about doing.
so, will there be a general SYNTAX-RULES pragma, for an early-stage variant of RULES? {-# SYNTAX-RULES "assert" assert = assertError lhsSrcLoc #-} {-# SYNTAX-RULES "head" head = headError lhsSrcLoc #-}
Your idea is to do something cheap and cheerful, which is always a good plan. But I don't see how to make it fly.
if I understand your webpage correctly, there is a cheap and cheerful variant for the non-recursive SRC_LOC_ANNOTATE - just generalise the code in RnExpr.lhs to look not only for assertId, but for head, fromJust, etc, as well, applying the same transformation to all: <name> | <name> in ["assert","head",..] -> <name>Error <srcLoc> or, to avoid relying on built-ins: <name> | srcLocAnnotate <name>, isDefined <name>Error -> <name>Error <srcLoc> the recursive variant of SRC_LOC_ANNOTATE, on the other hand, seems to require maintaining a call chain context, and using that to augment the translation. which means keeping track of the local name for that context within the definitions of the Error variants, something like: bind <name>Error <srcLocName> | srcLocAnnotate <name> -> setCallers <srcLocName> <name> | srcLocAnnotate <name>, isDefined <name>Error -> do {localSrcLocName <- getCallers ; <name>Error (<srcLoc>++":"++localSrcLocName) } where one annotated name, when used inside the definition of another's Error version, is augmented by its own location and the context passed down (though one would also like to ensure that the localSrcLocName hasn't been shadowed between bind and use). being context-sensitive, this variant would not be within reach of a simple syntax transformation pragma, but might still be possible as an extension of the current renamer transformations? Is that what you have in mind? Sounds useful to me (though one might occasionally want to have access to just the current location, without context - suggesting perhaps a list/stack of strings rather than a pre-concatenated string). Claus

Sounds useful to me (though one might occasionally want to have access to just the current location, without context - suggesting perhaps a list/stack of strings rather than a pre-concatenated string).
actually, there's a bit about your proposal (on the wiki page) that I don't quite follow, namely whether the annotated function versions are hand-written (needed only for making explicit use of the annotation parameter) or generated (convenient for passing on the annotation parameter). here are the use cases I can see arising: case 1: I want to have explicit access to the call site location info - I write both head and headError, and the pragma - calls to head will be translated into calls to headError <srcLoc>, where I can process that information however I need to. {-# SRC_LOC_ANNOTATE head #-} head (h:_) = h headError srcLoc (h:_) = h headError srcLoc _ = error ("head of empty list: "++show srcLoc) case 2: I have a function that calls annotated functions 2a: I want to have explicit access to the location info - I write both the function and its variant, and the pragma - calls to the function will be translated into annotated calls to its variant, I have to handle the information myself 2b: I just want to build up call stack information - I write only the function, and its pragma - a variant is generated, affecting calls to annotated functions as described on the web page case 0: I have a function with non-exhaustive left-hand sides, and I want the default branch to refer to the call site - I write only head, and its pragma - a definition of headError is generated, differing from head only in the extra parameter, and in the error branch which refers to that parameter (see case 1 for code) - calls to head will be translated into calls to headError <srcLoc>. {-# SRC_LOC_ANNOTATE head #-} head (h:_) = h [instead of generating variants as separate functions, one might simply transform the originals; the default branch should also be augmented in 2b, as in 0] cases 1&2 are as on the wiki page, where I think 2b is suggested, as a convenient default. case 2a would arise naturally if we annotate a function that also happens to call other annotated functions. and case 0 is the one that started this discussion, so it would be nice not to have to write out the boilerplate variants by hand. does this make sense? claus
participants (3)
-
Claus Reinke
-
dons@cse.unsw.edu.au
-
Simon Peyton-Jones