
The latest updates on HaRe with GHC API project seem to be posted on the google+ community page: https://plus.google.com/communities/116266567145785623821 On Monday, April 29, 2013 5:09:56 AM UTC-5, Simon Hengel wrote:
Hi Niklas, I haven't read the whole proposal as I'm short of time. But Alan Zimmerman is doing a lot of work on integrating HaRe with the GHC API [1]. He is alanz on freenode and a regular in #hspec.
I haven't looked at the code, but maybe it's of interest to you.
Cheers, Simon
[1] https://github.com/alanz/HaRe/tree/ghc-api
I would like to propose the development of source code refactoring tool that operates on Haskell source code ASTs and lets you formulate rewrite rules written in Haskell.
Objective ---------
The goal is to make refactorings easier and allow global code changes that might be incredibly tedious to do in a non-automated way. By making these transformations convenient, we can make it easier to maintain clean code, add new features or clean up leftovers faster, and reduce the fear and effort to upgrade to newer versions of packages and APIs.
Transformations ---------------
First, here are a few operations you would use this tool for. Some of them are common operations you would also do in other programming languages, some are more specific to Haskell.
* Changing all occurrences of "import Prelude hiding (catch)" to "import qualified Control.Exception as E"
* Replacing all uses of a function with that function being imported qualified or the other way around
* Adding a field to data constructor a record, setting user-supplied defaults for construction and destruction:
-- Suppose you want to change one of these data User = User { name :: String, age :: Int } data User = User String Int
-- into one of these data User = User { name :: String, age :: Int, active :: Bool } data User = User String Int Bool
-- the refactoring tool could perform, in all relevant locations: show (User name age) = ... show (User name age _) = ...
-- and also this transformation: ... u { name = "deleted" } ... ... u { name = "deleted", active = False } ...
-- or equivalently with records.
-- Special cases could be taken care of as specified, such as -- "whenever an object of [this User type] has -- of its records passed into some function 'email', do this -- now only if the user is active, so modify all relevant code -- email (name u) -- to -- if (active u) then email (name u) else return ()
-- Other examples include adding a position counter to attoparsec.
* Adding a type parameter to a type
-- This happens a lot on monad transformer stacks, e.g. newtype MyMonad a b c = MyMonad (ReaderT a (WriterT b ...
-- and as you would probably agree on, this is not the most -- comfortable change to make; in big project this can mean -- hour-long grinding.
-- It has also recently happened in the basic underlying types -- of packages like conduit and pipes.
* Adding a new transformer around a monad
* Addressing problems like mentioned in http://blog.ezyang.com/2012/01/modelling-io/: "There is one last problem with this approach: once the primitives have been selected, huge swaths of the standard library have to be redefined by “copy pasting” their definitions ..."
* Extracting a value into a let or where clause
* Renaming a variable, and all its occurrences that are semantically same variable (based on its scope)
* Changing the way things are done, such as:
* Replacing uses of fmap with <$>, also taking care of the corresponding import, and such cases were partial application is involved
* Replacing uses of "when (isJust)" to "forM_"
* Making imports clearer by adding all functions used to the file to the import list of the module that gets them in scope
* Finding all places where an exported function does not have all its arguments haddock-documented.
* Performing whole-project refactorings instead of operating on single files only, allowing operations like
"Find me all functions of this type, e.g. Maybe a -> (a -> m a) -> m a in the project and extract them into this new module, with the name 'onJust'."
Some of the problems above can be tried to address using regex-based search and replace, but this already fails in the simplest case of "import Prelude hiding (catch)" in case there is more than that imported from Prelude or newlines involved in the import list.
Transformation on the AST are much more powerful, and can guarantee that the result is, at least syntactically, valid. No text base tool can do
On Mon, Apr 29, 2013 at 02:00:23PM +0800, Niklas Hambüchen wrote: that.
Other uses ----------
In addition to being able to perform transformations as mentioned above, the refactoring tool as a library can be leveraged to:
* Support or be the base of code formatting tools such as haskell-stylish, linters, style/convention checkers, static analyzers, test coverage tools etc.
* Implement automatic API upgrades.
Imagine the author of a library you use deprecates some functions, introduces replacements, adds type parameters. In these cases, it is very clear and often well-documented which code has to be replaced by what. The library author could, along with the new release, publish "upgrade transformations" that you can apply to your code base to save most of the manual work.
These upgrade transformations could be either parts of the packages themselves or be separately maintained and refined on the feedback of users applying them to their code bases. This could allow the Haskell community to keep up their fast pace while making API breakage a non-problem.
Automation is what makes us deal well with test suites, source control, error checking. Taking the pain out of this will increase people's incentive to upgrade their code an to keep it well-maintained by easy refactorings.
This concept of automatic code upgrades would, to my knowledge, be quite unique in the programming language world, and the possibility to have this is, as many things, the result of Haskell's excellent type
system.
In comparison to dynamic and "unsafe" languages, we actually have a lot of information around that we can use to write powerful and expressive tools.
Splitting into GSoC tasks -------------------------
I think that the project is too large for one summer and should be split into two parts:
1. Implementation of a "full-source" transformation-enabling AST 2. The transformation engine and API / DSL
About 1: Creating a full-source AST
Existing parsers for Haskell tend to throw away information that is not necessary for compiling the code. Of course this is not good for a code-to-code transformation tool, stripping comments or whitespace the programmer cares about is not an option.
haskell-src-exts has gained support for comments around version 1.1 (as another GSoC project?), but still comments are treated as foreigners, given to you in a list detached from the AST; their positions are specified with integers, which means that when you modify the AST, you have to take care to adjust the comments as well.
The problem seems to be that haskell-source-exts is made for parsing, its AST is made for transformations. I have yet to see a tool that actually *modifies* haskell-src-exts's AST - most of them use it for parsing, and then apply some form of pretty-printing to get back to code.
We need an AST that contains *all* information about the original source code, which means render . parse = id; this is why I called it "full source" AST. Also, this AST should have all elements available in a way that encourages modification and does not discriminate against operations like re-indentation or alignment.
I like to think of this as something that could eventually be part of ghc, in a pipeline like
GHC's parser -> full-source AST -> reduced AST that strips parts irrelevant for compilation -> usual compilation pipeline
I am thinking of this being in a GHC environment because it would guarantee that it is up-to-date with the currently supported language extensions etc. However, I also see that this might slow down development and make things harder, so at least for the beginning it might be better to do it as as a normal library; I would love to get feedback on this.
It might be a good idea to start with GHC's or haskell-source-exts parser and modify them to obtain the full-source AST. I believe though that it would be beneficial if the AST would be to a certain extent be decoupled from the parser, such that other people could write other parsers (e.g. using other parsing libraries like Parsec, attoparsec, uu-parsinglib, trifecta) that produce the same compatible AST type.
I believe that creating this AST makes a GSoC project for itself, and that it would be beneficial for all efforts processing Haskell as a source language.
About 2: Transformation engine and how to write transformations
Transformations should be Haskell code that operates on the AST.
They would be similar to how you write TemplateHaskell in a way, yet much easier and more intuitive to use in most cases. I suggest the creation of a monadic DSL that allows you to select and match those parts of the code you are interested in, perform some case analysis and computation to determine how you want to transform it, and then give you convenient ways to express your changes in that DSL.
Functions in this DSL would roughly belong to one of the kinds * finding/matching * transforming/rewriting.
I have not thought about how this DSL would exactly look like, but I could imagine myself writing a high-level transformation like this:
is <- getImports let matching = filter (importsFunction "catch") . filter (hasModuleName "Prelude") mapM_ (rewrite . removeImportFunction "catch") matching
One of the priority goals of this GSoC project is making this API convenient; transformations should read more naturally and be much shorter than equivalent TemplateHaskell.
This extensive, convenience oriented monadic DSL should aim to have a reasonably large set of functions, and "do things for you" even if you could build them yourself with three or four combinators. It should base itself on a minimal set of core transformations. I think that the Haskell community has a lot of expertise, especially from parsing libraries, in creating this core API + convenience combinators duo, that could aid the GSoC student to get this right.
Along with this way to specify transformations, a transformation engine has to be constructed which can perform them somewhat efficiently over large code-bases.
While speed should should not be a main target for this project and functionality and expressiveness are the main goals, the transformation engine should be designed at least with speed in mind, which can be worked on at a later stage outside of the GSoC project.
This second part of the proposal should result in a library to perform transformations and an executable that can apply them to a folder of source files.
Possible extensions and follow-up projects ------------------------------------------
* Interactive refactoring tool
The refactoring tool could be driven by an interactive application that lets you write some transformations, apply them to your code base, view the diffs and build the project; on build failures or otherwise not-complete transformations it would allow to refine your transformations, or write custom special cases for selected files, line ranges or functions. This way you could quickly upgrade your code base in a fast write-check-repeat cycle.
* An API upgrade infrastructure
As mentioned above, transformations could be published along with new software packages that allow their dependants to easily upgrade their code to the latest API version with minimal manual effort. An infrastructure or standard way of way of doing this could be established. For now, I think that package maintainers would publish their upgrade transformations in their projects' or separate code repositories, and users would apply them with the refactoring tool as needed. I can also imagine a more dedicated infrastructure though, where API upgrades are stored on Hackage or a similar database.
Summary -------
This proposal contains two GSoC projects that I believe to be reasonably sized for one summer each.
I think that they make good projects according to GSoC standards as they are focused, feasible, and aimed at creating real-world code that bring a clearly visible benefit.
The entry barrier is not too high since knowledge of compiler or runtime internals is not required. However, applicants should probably have a fairly good understanding of parsers and how they are dealt with in Haskell, and be somewhat familiar with the Haskell community in order to find sources of good feedback and for being able to judge whether the tools being created would be convenient for the community to use.
Discussion ----------
Now I would be glad to get some responses on this proposal; I have written a bit of text, but it is still a very rough idea and I would love to hear your thoughts about it.
_______________________________________________ Haskell-Cafe mailing list Haskel...@haskell.org javascript: http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskel...@haskell.org javascript: http://www.haskell.org/mailman/listinfo/haskell-cafe