
Ghc a wherever possible. This would make things nicer with the upcoming split-up of more fine-grained control over the executed
Hello librarians, as some may know, I am currently working on improvements to the GHC API. Many of the exported functions of the GHC module have a type of the form Session -> ... -> IO (Maybe X) where 'Session' is mutable and 'X' is the actual result type of the function. In order to enforce (the implicitly assumed) single- threaded use of a session and to provide richer error information I am restructuring all exported functions of the GHC API to return a computation in the 'Ghc' monad. This currently looks like this(*) newtype Ghc a = Ghc { unGhc :: Session -> IO (Either GhcError a } newtype Session = Session (IORef HscEnv) Functions that modify a session are now in this monad but behave mostly the same. A more difficult decision is how to deal with errors. The GhcError type currently looks like this: -- | An error annotated with the phase it happened in. data GhcError = GhcError HscPhase Messages -- ^ A "normal" compilation error. | ApiError HscPhase String -- ^ An error that violated some pre-condition/invariant of the API. | GhcIOException Exception -- some IO exception -- XXX: would (forall e. Typeable e => e) be better? Note that we have to wrap IO exceptions and propagate them separately in the Ghc monad.(**) -- | Lift an 'IO' action into the 'Ghc' monad. IO exceptions are wrapped and -- can be queried by matching on the 'GhcIOException' constructor in -- 'ghcCatch'. XXX: is there a cleaner way? io :: IO a -> Ghc a io action = Ghc $ \_ -> Exception.handle (return . Left . GhcIOException) $ do a <- action return (Right a) Now, what is the proper way to translate a function of type ... -> IO (Maybe a) into this monad? My suggestion would be to prefer ... - phases. We could write parsed_file <- parse file tc_rn_file <- typecheckRename parsed_file simpl_file <- simplify tc_rn_file ... `onCompileError` $ \GhcError phase msgs -> somehowHandleErrors phase msgs instead of mb_parsed_file <- parse file case mb_parsed_file of Nothing -> ... Just parsed_file -> do mb_tc_rn_file <- typecheckRename parsed_file case mb_tr_rn_file of ... etc ... where 'onCompileError' would only catch errors with the constructor 'GhcError', not 'ApiError' or 'GhcIOException'. For the latter It is, however, not always clear what is a compile error and what is not. Consider, for example, the following function: -- | Takes a 'ModuleName' and possibly a 'PackageId', and consults the -- filesystem and package database to find the corresponding 'Module', -- using the algorithm that is used for an @import@ declaration. findModule :: ModuleName -> Maybe PackageId -> Ghc Module findModule mod_name maybe_pkg = withSession $ \hsc_env -> let dflags = hsc_dflags hsc_env hpt = hsc_HPT hsc_env this_pkg = thisPackage dflags in case lookupUFM hpt mod_name of Just mod_info -> return (mi_module (hm_iface mod_info)) _not_a_home_module -> do -- XXX: should we really throw IO exceptions here? res <- io $ findImportedModule hsc_env mod_name maybe_pkg case res of Found _ m | modulePackageId m /= this_pkg -> return m | otherwise -> throwDyn (CmdLineError (showSDoc $ text "module" <+> pprModule m <+> text "is not loaded")) err -> let msg = cannotFindModule dflags mod_name err in throwDyn (CmdLineError (showSDoc msg)) This either returns a Module or throw an exception by looking it up in the home package table. Otherwise, it throws a CmdLineError exception, which is intended for reporting failure inside GHCi. I guess the proper way for this function would be to throw an 'ApiError', i.e., we expect the looked-up module to be existing in the home package table and to be loaded. My questions thus are: - Does this sound like a reasonable strategy? - Is such a monad a good or a bad idea? - Does anyone have an idea of a classification of errors, or guidelines/principles for one? - Any other comments? Thanks, / Thomas (*) an alternative implementation that would probably be a bit more efficient in case of errors could use continuation-passing style: newtype Ghc a = Ghc { unGhc :: forall ans. Session -> (GhcError -> IO ans) -- failure continuation -> (a -> IO ans) -- success continuation -> IO ans (**) instances for mtl classes will be provided in a separate package to avoid adding mtl as a build dependency. -- I was wrong. / This changes everything.

On Mon, 14 Jul 2008, Thomas Schilling wrote:
Hello librarians,
as some may know, I am currently working on improvements to the GHC API. Many of the exported functions of the GHC module have a type of the form
Session -> ... -> IO (Maybe X)
where 'Session' is mutable and 'X' is the actual result type of the function. In order to enforce (the implicitly assumed) single-threaded use of a session and to provide richer error information I am restructuring all exported functions of the GHC API to return a computation in the 'Ghc' monad. This currently looks like this(*)
newtype Ghc a = Ghc { unGhc :: Session -> IO (Either GhcError a }
newtype Session = Session (IORef HscEnv)
Good idea.
Functions that modify a session are now in this monad but behave mostly the same. A more difficult decision is how to deal with errors. The GhcError type currently looks like this:
-- | An error annotated with the phase it happened in. data GhcError = GhcError HscPhase Messages -- ^ A "normal" compilation error. | ApiError HscPhase String -- ^ An error that violated some pre-condition/invariant of the API. | GhcIOException Exception -- some IO exception -- XXX: would (forall e. Typeable e => e) be better?
Another instance of mixing errors (ApiError) and exceptions (GhcError, GhcIOException)? How should I handle errors that I made myself by calling the GHC API the wrong way? Of course, I must correct those calls to GHC instead. The clean design would be to drop the ApiError constructor and indicate wrong uses of the GHC library with simple 'error'.
Note that we have to wrap IO exceptions and propagate them separately in the Ghc monad.(**)
That's a good thing. All (IO) exceptions should be handled this way. (See the extensible exception thread on this list.)
For the latter It is, however, not always clear what is a compile error and what is not. Consider, for example, the following function:
-- | Takes a 'ModuleName' and possibly a 'PackageId', and consults the -- filesystem and package database to find the corresponding 'Module',-- using the algorithm that is used for an @import@ declaration. findModule :: ModuleName -> Maybe PackageId -> Ghc Module findModule mod_name maybe_pkg = withSession $ \hsc_env -> let dflags = hsc_dflags hsc_env hpt = hsc_HPT hsc_env this_pkg = thisPackage dflags in case lookupUFM hpt mod_name of Just mod_info -> return (mi_module (hm_iface mod_info)) _not_a_home_module -> do -- XXX: should we really throw IO exceptions here? res <- io $ findImportedModule hsc_env mod_name maybe_pkg case res of Found _ m | modulePackageId m /= this_pkg -> return m | otherwise -> throwDyn (CmdLineError (showSDoc $ text "module" <+> pprModule m <+> text "is not loaded")) err -> let msg = cannotFindModule dflags mod_name err in throwDyn (CmdLineError (showSDoc msg))
This either returns a Module or throw an exception by looking it up in the home package table. Otherwise, it throws a CmdLineError exception, which is intended for reporting failure inside GHCi. I guess the proper way for this function would be to throw an 'ApiError', i.e., we expect the looked-up module to be existing in the home package table and to be loaded.
Can you please explain, what you are doing here? If a module cannot be found on disk, this is certainly an (IO) exception, since you cannot enforce the existence of a file. If a module cannot be found in an internal table, although it should be there, this is an 'error'. However one should try to minimize such situations, may by organizing the lookup in another way.
My questions thus are:
- Does this sound like a reasonable strategy? - Is such a monad a good or a bad idea?
Good idea!
- Does anyone have an idea of a classification of errors, or guidelines/principles for one?
Yes! Read my articles on the Wiki http://www.haskell.org/haskellwiki/Exception http://www.haskell.org/haskellwiki/Error and my posts to the "extensible exception" thread: http://www.haskell.org/pipermail/libraries/2008-July/010120.html and share my opinion! :-)
- Any other comments?
Thank you for the practical example of exceptions!
(*) an alternative implementation that would probably be a bit more efficient in case of errors could use continuation-passing style:
newtype Ghc a = Ghc { unGhc :: forall ans. Session -> (GhcError -> IO ans) -- failure continuation -> (a -> IO ans) -- success continuation -> IO ans
Why do you think it would be more efficient?

On 14 Jul 2008, at 16:58, Henning Thielemann wrote:
Another instance of mixing errors (ApiError) and exceptions (GhcError, GhcIOException)? How should I handle errors that I made myself by calling the GHC API the wrong way? Of course, I must correct those calls to GHC instead. The clean design would be to drop the ApiError constructor and indicate wrong uses of the GHC library with simple 'error'.
Ok, I read your articles. While I must say the terminology is a bit confusing, I agree with the overall distinction. However, for an API, using 'error' should be done very seldom, preferably never. Thus I probably should reconsider the use case and think about how to restructure the API in a way that this error never occurs. Oleg and Chung-chieh's "Lightweight Static Capabilites"[1] show some interesting and simple ideas how one could do that. [1]: http://okmij.org/ftp/papers/lightweight-static-capabilities.pdf
Note that we have to wrap IO exceptions and propagate them separately in the Ghc monad.(**)
That's a good thing. All (IO) exceptions should be handled this way. (See the extensible exception thread on this list.)
For the latter It is, however, not always clear what is a compile error and what is not. Consider, for example, the following function:
-- | Takes a 'ModuleName' and possibly a 'PackageId', and consults the -- filesystem and package database to find the corresponding 'Module',-- using the algorithm that is used for an @import@ declaration. findModule :: ModuleName -> Maybe PackageId -> Ghc Module findModule mod_name maybe_pkg = withSession $ \hsc_env -> let dflags = hsc_dflags hsc_env hpt = hsc_HPT hsc_env this_pkg = thisPackage dflags in case lookupUFM hpt mod_name of Just mod_info -> return (mi_module (hm_iface mod_info)) _not_a_home_module -> do -- XXX: should we really throw IO exceptions here? res <- io $ findImportedModule hsc_env mod_name maybe_pkg case res of Found _ m | modulePackageId m /= this_pkg -> return m | otherwise -> throwDyn (CmdLineError (showSDoc $ text "module" <+> pprModule m <+> text "is not loaded")) err -> let msg = cannotFindModule dflags mod_name err in throwDyn (CmdLineError (showSDoc msg))
This either returns a Module or throw an exception by looking it up in the home package table. Otherwise, it throws a CmdLineError exception, which is intended for reporting failure inside GHCi. I guess the proper way for this function would be to throw an 'ApiError', i.e., we expect the looked-up module to be existing in the home package table and to be loaded.
Can you please explain, what you are doing here?
If a module cannot be found on disk, this is certainly an (IO) exception, since you cannot enforce the existence of a file. If a module cannot be found in an internal table, although it should be there, this is an 'error'. However one should try to minimize such situations, may by organizing the lookup in another way.
Ah sorry. This is an existing function, that tries to find a module in the set of installed packages (or only within a specified package). I think it can reasonably be expected to fail. For example, the package could be hidden or the module is not part of an existing package. Well, actually, the original type was findModule :: Session -> ModuleName -> Maybe PackageId -> IO Module and throwing an exception was used as error the reporting mechanism. I think Simon's Extensible Exception paper actually mentions that they are sometimes used as error reporting mechanism "for convenience". Of course, we can and should normalise this behaviour at the API-level, so I will study your list of articles and the mailing list thread (I was hoping I could avoid reading the whole thread ;) )
(*) an alternative implementation that would probably be a bit more efficient in case of errors could use continuation-passing style:
newtype Ghc a = Ghc { unGhc :: forall ans. Session -> (GhcError -> IO ans) -- failure continuation -> (a -> IO ans) -- success continuation -> IO ans
Why do you think it would be more efficient?
It performs less matching on constructors. Consider the monad instance of the Either variant: instance Monad Ghc where return x = Ghc $ \_ -> return (Right x) m >>= k = Ghc $ \s -> do rslt <- runGhc m s case rslt of Left err -> return (Left err) Right a -> runGhc (k a) s Here, every >>= immediately deconstructs the value constructed by the monad. If an error is thrown, all >>= calls will merely deconstruct the value of the previous call, and reconstruct it immediately. The CPS variant looks like this: instance Monad Ghc where return x = Ghc $ \s fk k -> k x m >>= f = Ghc $ \s fk k -> runGhc' m s fk (\a -> runGhc' (f a) s fk k) This simply adjusts the continuation and the failure continuation is just passed through and is called directly in case of an error throw err = Ghc $ \_ fk _ -> fk err i.e., the CPS immediately jumps to the error handler and aborts the current continuation. / Thomas -- Once upon a time is now.
participants (2)
-
Henning Thielemann
-
Thomas Schilling