
So I have a large CAF which is expensive to generate. Theoretically it should be possible to totally evaluate at compile time, resulting in a bunch of constructor calls, ideally totally de-thunked and in the read-only segment of the binary. In the absence of a "eval at compile time" pragma, it seemed like TH should be able to do this, and searching haskell-cafe I turned up an old post by Wren where he is doing basically that, and eventually discovered the Lift class in http://hackage.haskell.org/package/template-haskell-2.8.0.0/docs/Language-Ha... However, if I understand Lift correctly (and not really understanding much of TH), you need to create instances for every type you wish to generate, which seems like it would be a pain. Even if they can be automatically derived, it would spread TH dependency gunk throughout the whole program. Is this true? Is there a library that does the equivalent of a "eval at compile time" pragma? (Wren's proposed QAF library seems to have never made it to hackage, but maybe given Lift and the proper instances it turns out to be trivial.) Would it be possible or desirable for an actual pragma that wouldn't introduce a TH dependency? Also, I assume it would produce a giant set of constructor applications which ghc would then optimize as well as it can... but it seems like that might not include full strictness, since even 'x = (4, undefined)' is obliged to not diverge as long as you don't look at the snd field, so even a large literal expression is actually unevaluated code if there are some non-strict data types in there. And... is it actually possible for ghc to do clever optimization with constant values, i.e. lay them out fully evaluated in read-only memory? I know that something like 'x = "abc" ++ "def"' will wind up as 'unpackCString# "abcdef"', but I'm curious what happens to more complicated data structures. Strictness seems to make a difference, e.g. with nonstrict fields the core has separate bindings for the contained values, while with strict ones the values get inlined directly into the consumer of the data type and the constructor is nowhere to be seen. But if the type is recursive (data X = X Int (Maybe X)), then we wind up with CAFs applying it, though they have lots of provocative flags that indicate ghc knows it's dealing with constructors: Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=True, ConLike=True, WorkFree=False, Expandable=True, Guidance=IF_ARGS [] 10 30}] I assume it still can't hoist the undefined to the entire expression though, because of non-strictness. I would think that if all data types are strict, then it could transform 'caf = X 42 (StrictJust (X 24 undefined))' to 'caf = undefined', but that doesn't seem to happen either. Tangentially, I've noticed that the 'unpackCString# "abcdef"' optimization is limited to String, replacing it with Text produces "abc" + giant wodge of code that is presumably appending "def" at runtime. I'm sure I've seen some discussions around here about wanting to optimize string literals to 'Text 0 len (giant chunk of binary data)', but I don't think they talked about possible compile time evaluation... presumably it could also solve that problem?

evan, could you share a minimal example of the code that illustrates your
problem? It may be that theres
a) an alternative way to write it that that gives the
perf characteristics you want
b) it could be a good example for future ghc optimization efforts
c) other
-Carter
On Sat, Jan 18, 2014 at 7:14 PM, Evan Laforge
So I have a large CAF which is expensive to generate. Theoretically it should be possible to totally evaluate at compile time, resulting in a bunch of constructor calls, ideally totally de-thunked and in the read-only segment of the binary.
In the absence of a "eval at compile time" pragma, it seemed like TH should be able to do this, and searching haskell-cafe I turned up an old post by Wren where he is doing basically that, and eventually discovered the Lift class in
http://hackage.haskell.org/package/template-haskell-2.8.0.0/docs/Language-Ha...
However, if I understand Lift correctly (and not really understanding much of TH), you need to create instances for every type you wish to generate, which seems like it would be a pain. Even if they can be automatically derived, it would spread TH dependency gunk throughout the whole program. Is this true? Is there a library that does the equivalent of a "eval at compile time" pragma? (Wren's proposed QAF library seems to have never made it to hackage, but maybe given Lift and the proper instances it turns out to be trivial.) Would it be possible or desirable for an actual pragma that wouldn't introduce a TH dependency?
Also, I assume it would produce a giant set of constructor applications which ghc would then optimize as well as it can... but it seems like that might not include full strictness, since even 'x = (4, undefined)' is obliged to not diverge as long as you don't look at the snd field, so even a large literal expression is actually unevaluated code if there are some non-strict data types in there.
And... is it actually possible for ghc to do clever optimization with constant values, i.e. lay them out fully evaluated in read-only memory? I know that something like 'x = "abc" ++ "def"' will wind up as 'unpackCString# "abcdef"', but I'm curious what happens to more complicated data structures. Strictness seems to make a difference, e.g. with nonstrict fields the core has separate bindings for the contained values, while with strict ones the values get inlined directly into the consumer of the data type and the constructor is nowhere to be seen. But if the type is recursive (data X = X Int (Maybe X)), then we wind up with CAFs applying it, though they have lots of provocative flags that indicate ghc knows it's dealing with constructors:
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=True, ConLike=True, WorkFree=False, Expandable=True, Guidance=IF_ARGS [] 10 30}]
I assume it still can't hoist the undefined to the entire expression though, because of non-strictness. I would think that if all data types are strict, then it could transform 'caf = X 42 (StrictJust (X 24 undefined))' to 'caf = undefined', but that doesn't seem to happen either.
Tangentially, I've noticed that the 'unpackCString# "abcdef"' optimization is limited to String, replacing it with Text produces "abc" + giant wodge of code that is presumably appending "def" at runtime. I'm sure I've seen some discussions around here about wanting to optimize string literals to 'Text 0 len (giant chunk of binary data)', but I don't think they talked about possible compile time evaluation... presumably it could also solve that problem? _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sat, Jan 18, 2014 at 4:25 PM, Carter Schonwald
evan, could you share a minimal example of the code that illustrates your problem? It may be that theres a) an alternative way to write it that that gives the perf characteristics you want b) it could be a good example for future ghc optimization efforts c) other
Sure. As you might guess, there are lots of dependencies, but you don't have to care about them. A Patch has a bunch of fields, but the key part is Score.Attributes, which is a newtype over Set Text. All the attrs_* functions are just the obvious wrappers around set operations. 'strip_attr' tries to remove redundant attributes, but can only do so if that doesn't cause it to collide with an existing attribute set (which means it wasn't redundant after all). You'll notice it's naively implemented, since it does a linear search through all the other attributes. Given 41 instruments, 12 attrs to strip, and a typical instrument having 285 attrs, that winds up being something like 41 * 12 * 285^2, and takes about 0.39 CPU seconds to force with NFData. I appended a less naive version that replaces the linear search with a Set and it's faster (0.19, presumably ^2 becomes (* log 285)), but is uglier. So I did find an alternative way, but it's still fairly expensive, and it would be nice to be able to write the slow but pretty version and pay the cost at compile time. All the attributes data is coming from another module which is basically 1855 lines of CAFs. I could apply the attribute stripping by hand to that, but it would be error-prone and ugly and lots of work... that's the machine's job! patches :: [MidiInst.Patch] patches = [add_code hmap (make_patch inst category) | ((inst, hmap), category) <- instruments] where add_code hmap patch = (patch, code) where code = MidiInst.note_calls (note_calls hmap patch) make_patch :: VslInst.Instrument -> Text -> Instrument.Patch make_patch inst category = instrument_patch category (second strip (make_instrument inst)) where strip = uncurry zip . first strip_attrs . unzip strip_attrs :: [Score.Attributes] -> [Score.Attributes] strip_attrs attrs = foldr strip_attr attrs strip where strip = reverse [ VslInst.sus, VslInst.vib, VslInst.perf, VslInst.fast, VslInst.fa , VslInst.norm, VslInst.na, VslInst.legato, VslInst.v1, VslInst.art , VslInst.med, VslInst.short ] -- | Strip the given attr, but only if it wouldn't cause clashes. strip_attr :: Score.Attributes -> [Score.Attributes] -> [Score.Attributes] strip_attr attr all_attrs = map (strip_redundant attr) all_attrs where strip_redundant attr attrs | stripped `elem` all_attrs = attrs | otherwise = stripped where stripped = Score.attrs_diff attrs attr -- optimized version, applied via mapAccumL to thread the Set through each call: strip_attr :: Score.Attributes -> (Set.Set Score.Attributes, [Score.Attributes]) -> (Set.Set Score.Attributes, [Score.Attributes]) strip_attr attr (all_attrs_set, all_attrs) | any (`Score.attrs_contain` attr) all_attrs = List.mapAccumL strip_redundant all_attrs_set all_attrs | otherwise = (all_attrs_set, all_attrs) where strip_redundant attrs_set attrs | Set.member stripped attrs_set = (attrs_set, attrs) | otherwise = (Set.insert stripped attrs_set, stripped) where stripped = Score.attrs_diff attrs attr

On Sat, Jan 18, 2014 at 7:14 PM, Evan Laforge
However, if I understand Lift correctly (and not really understanding much of TH), you need to create instances for every type you wish to generate, which seems like it would be a pain. Even if they can be automatically derived, it would spread TH dependency gunk throughout the whole program. Is this true? Is there a library that does the equivalent of a "eval at compile time" pragma? (Wren's proposed QAF library seems to have never made it to hackage, but maybe given Lift and the proper instances it turns out to be trivial.) Would it be possible or desirable for an actual pragma that wouldn't introduce a TH dependency?
Hi Evan, Check out https://hackage.haskell.org/package/th-lift. Also, there is a of zeroTH here https://github.com/mgsloan/zeroth which works with a haskell-src-exts < 1.14. I'm not sure what benefit you'd get from a new mechanism (beside TH) to calculate things at compile-time. Won't it have to solve the same problems which are solved by TH already? How can those problems (generating haskell code, stage restriction) be solved without ending up with the same kind of complexity ("TH dependency gunk")? Regards, Adam

On Sat, Jan 18, 2014 at 4:56 PM, adam vogt
Check out https://hackage.haskell.org/package/th-lift. Also, there is a of zeroTH here https://github.com/mgsloan/zeroth which works with a haskell-src-exts < 1.14.
Thanks, I'll take a look. Though since I have my faster-but-uglier solution, at this point I'm mostly only theoretically interested, and hoping to learn something about compilers and optimization :)
I'm not sure what benefit you'd get from a new mechanism (beside TH) to calculate things at compile-time. Won't it have to solve the same problems which are solved by TH already? How can those problems (generating haskell code, stage restriction) be solved without ending up with the same kind of complexity ("TH dependency gunk")?
Well, TH is much more powerful in that it can generate any expression at compile time. But in exchange, it slows down compilation a lot, introduces an order dependency in the source file, and causes complications for the build system (I don't remember exactly, but it came down to needing to find the .o files at compile time). I would think, in the handwaviest kind of way, that the compiler could compile a CAF, and then just evaluate it on the spot by just following all the code thunk pointers (similar to a deepseq), and then emit the raw data structure that comes out. Of course that assumes that there is a such thing as "raw" data, which is why I got all side tracked wondering about compile time optimization in general. I expect it's not like C where you would wind up with a nested bunch of structs you could just write directly to the .TEXT section of the binary and then mmap into place when the binary is run. Even in C you'd need to go fix up pointers. At which point it sounds like a dynamic loader :)

You ask for something that ghc doesnt have yet, but perhaps could have at
some point. (If I'm reading you right). Currently ghc doesn't have a way
of doing what you want! Eg, I don't think there's even really support as
yet for that sort of notion in the context of just boxed/unboxed/storable
arrays.
There's definitely a few example pieces of code here it'd be nice to
express a read only lookup array that's fixed before run time for various
bit fiddling etc algs.
On Saturday, January 18, 2014, Evan Laforge
On Sat, Jan 18, 2014 at 4:56 PM, adam vogt
javascript:;> wrote: Check out https://hackage.haskell.org/package/th-lift. Also, there is a of zeroTH here https://github.com/mgsloan/zeroth which works with a haskell-src-exts < 1.14.
Thanks, I'll take a look. Though since I have my faster-but-uglier solution, at this point I'm mostly only theoretically interested, and hoping to learn something about compilers and optimization :)
I'm not sure what benefit you'd get from a new mechanism (beside TH) to calculate things at compile-time. Won't it have to solve the same problems which are solved by TH already? How can those problems (generating haskell code, stage restriction) be solved without ending up with the same kind of complexity ("TH dependency gunk")?
Well, TH is much more powerful in that it can generate any expression at compile time. But in exchange, it slows down compilation a lot, introduces an order dependency in the source file, and causes complications for the build system (I don't remember exactly, but it came down to needing to find the .o files at compile time). I would think, in the handwaviest kind of way, that the compiler could compile a CAF, and then just evaluate it on the spot by just following all the code thunk pointers (similar to a deepseq), and then emit the raw data structure that comes out. Of course that assumes that there is a such thing as "raw" data, which is why I got all side tracked wondering about compile time optimization in general. I expect it's not like C where you would wind up with a nested bunch of structs you could just write directly to the .TEXT section of the binary and then mmap into place when the binary is run. Even in C you'd need to go fix up pointers. At which point it sounds like a dynamic loader :) _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org javascript:; http://www.haskell.org/mailman/listinfo/haskell-cafe

Point being, I think your pointing to an idea other people are
(also) interested in exploring for ghc, and that there's some interesting
subltelties to it.
On Saturday, January 18, 2014, Carter Schonwald
You ask for something that ghc doesnt have yet, but perhaps could have at some point. (If I'm reading you right). Currently ghc doesn't have a way of doing what you want! Eg, I don't think there's even really support as yet for that sort of notion in the context of just boxed/unboxed/storable arrays.
There's definitely a few example pieces of code here it'd be nice to express a read only lookup array that's fixed before run time for various bit fiddling etc algs.
On Saturday, January 18, 2014, Evan Laforge
> wrote: On Sat, Jan 18, 2014 at 4:56 PM, adam vogt
wrote: Check out https://hackage.haskell.org/package/th-lift. Also, there is a of zeroTH here https://github.com/mgsloan/zeroth which works with a haskell-src-exts < 1.14.
Thanks, I'll take a look. Though since I have my faster-but-uglier solution, at this point I'm mostly only theoretically interested, and hoping to learn something about compilers and optimization :)
I'm not sure what benefit you'd get from a new mechanism (beside TH) to calculate things at compile-time. Won't it have to solve the same problems which are solved by TH already? How can those problems (generating haskell code, stage restriction) be solved without ending up with the same kind of complexity ("TH dependency gunk")?
Well, TH is much more powerful in that it can generate any expression at compile time. But in exchange, it slows down compilation a lot, introduces an order dependency in the source file, and causes complications for the build system (I don't remember exactly, but it came down to needing to find the .o files at compile time). I would think, in the handwaviest kind of way, that the compiler could compile a CAF, and then just evaluate it on the spot by just following all the code thunk pointers (similar to a deepseq), and then emit the raw data structure that comes out. Of course that assumes that there is a such thing as "raw" data, which is why I got all side tracked wondering about compile time optimization in general. I expect it's not like C where you would wind up with a nested bunch of structs you could just write directly to the .TEXT section of the binary and then mmap into place when the binary is run. Even in C you'd need to go fix up pointers. At which point it sounds like a dynamic loader :) _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sat, Jan 18, 2014 at 6:57 PM, Carter Schonwald
Point being, I think your pointing to an idea other people are (also) interested in exploring for ghc, and that there's some interesting subltelties to it.
I was actually kind of hoping someone would point out some of those subtleties. I figure there must be some since I don't know of any compiler other than the common lisp ones that implements something like this. Another thing that it could be useful for is that when declaring data it's sometimes convenient to declare data structures separately and then stitch them together, e.g. xs = makeXs [("name1", a), ("name2", b), ...] -- makeXs uses list order to infer things ys = makeYs [('x', ..., "name2"), ('y', ..., "name1")] -- same for makeYs things = do (c, ..., name) <- ys let x = fromMaybe (error "ack") $ lookup name xs return $ Thing name c x ... Not only would compile time evaluation eliminate some startup overhead, it could enforce at compile time that the names match up, along with other invariants in literal data, such as uniqueness, or that your keymap doesn't have any collisions, or whatever.

On Sat, Jan 18, 2014 at 7:14 PM, Evan Laforge
(Wren's proposed QAF library seems to have never made it to hackage, but maybe given Lift and the proper instances it turns out to be trivial.)
Yep, the Lift class does essentially the same thing, which is why I never published QAF -- Live well, ~wren
participants (4)
-
adam vogt
-
Carter Schonwald
-
Evan Laforge
-
wren ng thornton