
7 décembre 2020 14:25 "Johannes Waldmann"
I often wanted a tool that finds (nearly) duplicate AST sub-trees in a large code base, and suggests refactorings.
Of course, in an IDE, it could alert me on-the-fly that I'm typing some code that's already present elsewhere.
How might one go about implementing this? Actual (approximate) sub-tree matching seems the easy part; but I have no clear idea about whether this should just use syntax, or needs types as well (my guess is: yes) what libraries are there to provide the (annotated) ASTs, etc.
- J.W.
Hi, I developed last year a small tool for redundancy detection in OCaml called asak [1], which can be easily adapted for Haskell. The idea is pretty simple: use an intermediate language of the compilation pipeline to normalize the code and remove sugar (for Haskell, one can take Core), inline everything, and hash the tree bottom-up (abstracting constants) keeping intermediates trees. Then you just compare hashs of trees against pre-computed hashs of, let's say, the whole Hackage ecosystem. The technique is pretty efficient and scalable. I ran asak against the whole OPAM repository and got some results (like the detection of `map_opt` 140 times under 32 different names). There are some drawbacks: the code needs to be compiled down to Core and one has to maintain a database of available hashs, but the first one seems legitimate and the other inherent to any such tool. I started developing a plugin editor for emacs, but never go further due to a lack of time. Anyway, the approach is language-agnostic and could be easily adapted to Haskell. Moreover, I have an "inline everything" part which is not possible in OCaml (due to effects), and will be highly valuable in Haskell. Best, -- Alexandre Moine [1]: https://github.com/nobrakal/asak