
summary: speed up Syb with Uniplate-inspired techniques == Performance issues in Syb == As it stands, classic Syb, while well-supported in GHC isn't exactly the fastest generic programming option. In fact, it routinely seems to come out last in performance comparisons, sometimes by a substantial factor. That isn't always a problem in practice, because the gains in expressiveness/conciseness/maintainability outweigh the loss in performance, because traversal performance is not the bottleneck, or because practical use often involves hand-tuned traversal schemes instead of the example schemes typically used in benchmarks. Be that as it may, performance is a consideration, negative results have been published, and it seems that performance of experimental code has led to Syb being abandoned in at least one project. While it would be unrealistic to expect Syb traversals to compete with hand-written ones with current compilers, there are several obvious areas where Syb traversal performance could be subjected to improvements: (a) traversing irrelevant parts of the structure (because everything is treated generically) (b) combining results in simple, but inefficient ways ((++) nesting with the structure) (c) repeated runtime type checks to determine which function in a generically extended transformation/query to apply While runtime type checks are inherent in SYB (and alternatives have been proposed that claim to avoid them), all of a/b/c can be addressed to some extent in defining tuned traversals using the basic library. There tends to be a trade-off for a vs c, as avoiding senseless generic traversals of specific substructures implies additional runtime type checks to identify those substructures == Avoiding irrelevant traversals with substructure type maps == This message is concerned with addressing (a), in a generic way that does not require hand-tuning to the types in question, and without adding to the burden of (c). The basic idea comes from the Uniplate library, particularly the version implemented on top of Data (Data.Generics.PlateData). The idea is to compute the set of substructure types for a type under traversal and was described in last year's Uniplate paper ([1], or chapter 3 of Neil's recent thesis [2], section 3.6.2, "Optimising PlateData"), though its generality is somewhat obscured by the claim that: The next optimisation relies on the extra information present in the Uniplate operations - namely the target type. A boilerplate operation walks over a data structure, looking for target values to process. In SYB, the target values may be of any type. For Uniplate the target is a single uniform type. If a value is reached which is not a container for the target type, no further exploration is required of the values children. A Data-based 'contains' operation is defined that returns the list of immediate substructure types, represented by existentially boxed 'undefined's of the appropriate types, for all constructors in a given type. This operation can then be iterated to given all substructure types. One interesting point to note is that -though Syb is data-based- the computation of substructure types is based on reflecting a type-based view back into data. For sum types, every concrete data value only presents a subset of substructure types, but Syb's reflection features make it possible to talk about all possible constructors (*). The performance benefits can be substantial - the draft thesis claims In the benchmarks we improve on SYB by between 30% and 225%, with an average of 145% faster. So it seemed worthwhile to generalise this technique for use in Syb, which turned out to be possible after all, leading to the promised Data.Generics with GPS (Generic Positioning System;-). == Data.Generics.GPS == GPS employs Maps, to avoid getting lost in Data: - for each traversed type, build a Map TypeKey TypeSet, mapping all substructure types of the given type to their substructure types - traversals are short-circuited when the domain types of their queries or transformations cannot be found in the current substructure types - domains of queries and transformations are computed on construction GPS is inspired by Uniplate's PlateData direction finder (contains and DataBox are copied from the Uniplate paper), generalised to tackle SYB's more general queries and transformations (instead of oracles telling whether to stop, follow, or find in a search for type b in type a, we build IntSets of TypeRep keys, both for the domains of traversals and for substructure types; then several short-circuiting decisions can be based on fast intersection tests with the same IntSet). Data.Generics.GPS reexports Data.Generics, modifying everything, everywhere, mkQ, extQ, mkT, extT in such a way that building and extending transformations/queries also computes and records their domains and default transformations/values: data GenericDomainQ r = GenericDomainQ { queryDomain :: TypeSet, defaultValue :: r, genericQuery :: GenericQ r } data GenericDomainT = GenericDomainT { transDomain :: TypeSet, defaultTrans :: GenericT, genericTrans :: GenericT } while traversals accept those refined transformations/queries and add substructure type maps for short-circuiting, eg: everywhere :: forall a . Data a => GenericDomainT -> a -> a everywhere = everywhereWithMap (getSubs subMap) where subMap = fromRoot (undefined::a) Map.empty everywhereWithMap :: forall a . Data a => (forall a . Data a => a -> TypeSet) -> GenericDomainT -> a -> a everywhereWithMap getSubs gdt@(GenericDomainT{transDomain=domain, defaultTrans=dt, genericTrans=t}) x | not $ IS.null $ domain `intersection` getSubs x = t (gmapT (everywhereWithMap getSubs gdt) x) | otherwise = dt x == Performance testing in Paradise == To test the performance, I used the example in which Syb performed worst in the Uniplate paper: computing the bill in the Paradise benchmark. To get easily measurable timings, I added 100000 redundant copies of the departments in genCom. Switching from Data.Generics to Data.Generics.GPS is as easy as changing imports (and possibly types): import qualified Data.Generics as DG import Data.Generics.GPS bill :: Data a => a -> Integer bill = DG.everything (+) (0 `DG.mkQ` billS) where billS (S s) = s bill' :: Data a => a -> Integer bill' = everything (+) (0 `mkQ` billS) where billS (S s) = s Performance improves drastically, as expected (close to hand-tuned versions of everything and everywhere that explicitly skip the Strings in the Paradise types), even though substructure type maps are not yet as widely shared as in PlateData (as trace-instrumented type map construction shows). Unfortunately, a serious performance problem in Uniplate (of which Neil is already aware) completely masks the PlateData optimisations, so it is not yet visible whether the gap between PlateData and Syb has disappeared completely: -- ghc --make -O2 Main.hs; GHC version 6.9.20080514 $ ./Main.exe > dump "Data.Generics increase: 5 secs" 5550200000 "Data.Generics bill: 12 secs" "Data.Generics.GPS increase: 2 secs" 5550200000 "Data.Generics.GPS bill: 3 secs" "Data.Generics.PlateData increase: 1 min, 59 secs" 5550200000 "Data.Generics.PlateData bill: 1 min, 54 secs" == General implications of the techniques used == TypeRep keys combined with IntSets or IntMaps should be of general interest to reader of this list, as they can be used to speed up other generic programming problems as well, including typecase and extensible records libraries. Two examples should make this connection obvious: - generic queries and transformations generalise record selection and update (particularly plain to see in Uniplates universeBi/transformBi) - the core of Smash is to replace Syb's runtime type checks with compile-time type checks, but Smash's static typecase is based on HList's extensible record selection, encoded in type-class programs which current compilers do not yet partially evaluate to entirely static, constant-time selection; so, while Smash conceptually replaces runtime type selection with compile-time type selection, its compile-time type selection is compiled into runtime type selection Now, think about combining conceptually nice type-class-based type-case and record operations with pragmatically efficient IntMap-based runtime representations and selection/modification operations!-) Feedback, comments, bug-reports welcome, as usual. Claus [1] http://www-users.cs.york.ac.uk/~ndm/uniplate/ [2] http://www-users.cs.york.ac.uk/~ndm/thesis/ (*) this wouldn't work so well on non-regular types, because of the potentially infinite set of substructure types