evaluating CAFs at compile time

19 Jan 2014

      So I have a large CAF which is expensive to generate.  Theoretically
it should be possible to totally evaluate at compile time, resulting
in a bunch of constructor calls, ideally totally de-thunked and in the
read-only segment of the binary.

In the absence of a "eval at compile time" pragma, it seemed like TH
should be able to do this, and searching haskell-cafe I turned up an
old post by Wren where he is doing basically that, and eventually
discovered the Lift class in
http://hackage.haskell.org/package/template-haskell-2.8.0.0/docs/Language-Ha...

However, if I understand Lift correctly (and not really understanding
much of TH), you need to create instances for every type you wish to
generate, which seems like it would be a pain.  Even if they can be
automatically derived, it would spread TH dependency gunk throughout
the whole program.  Is this true?  Is there a library that does the
equivalent of a "eval at compile time" pragma? (Wren's proposed QAF
library seems to have never made it to hackage, but maybe given Lift
and the proper instances it turns out to be trivial.)  Would it be
possible or desirable for an actual pragma that wouldn't introduce a
TH dependency?

Also, I assume it would produce a giant set of constructor
applications which ghc would then optimize as well as it can...  but
it seems like that might not include full strictness, since even 'x =
(4, undefined)' is obliged to not diverge as long as you don't look at
the snd field, so even a large literal expression is actually
unevaluated code if there are some non-strict data types in there.

And... is it actually possible for ghc to do clever optimization with
constant values, i.e. lay them out fully evaluated in read-only
memory?  I know that something like 'x = "abc" ++ "def"' will wind up
as 'unpackCString# "abcdef"', but I'm curious what happens to more
complicated data structures.  Strictness seems to make a difference,
e.g. with nonstrict fields the core has separate bindings for the
contained values, while with strict ones the values get inlined
directly into the consumer of the data type and the constructor is
nowhere to be seen.  But if the type is recursive (data X = X Int
(Maybe X)), then we wind up with CAFs applying it, though they have
lots of provocative flags that indicate ghc knows it's dealing with
constructors:

 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=True,
         ConLike=True, WorkFree=False, Expandable=True,
         Guidance=IF_ARGS [] 10 30}]

I assume it still can't hoist the undefined to the entire expression
though, because of non-strictness.  I would think that if all data
types are strict, then it could transform 'caf = X 42 (StrictJust (X
24 undefined))' to 'caf = undefined', but that doesn't seem to happen
either.

Tangentially, I've noticed that the 'unpackCString# "abcdef"'
optimization is limited to String, replacing it with Text produces
"abc" + giant wodge of code that is presumably appending "def" at
runtime.  I'm sure I've seen some discussions around here about
wanting to optimize string literals to 'Text 0 len (giant chunk of
binary data)', but I don't think they talked about possible compile
time evaluation... presumably it could also solve that problem?

Evan Laforge

Carter Schonwald

Evan Laforge

adam vogt

Evan Laforge

Carter Schonwald

Carter Schonwald

Evan Laforge

wren ng thornton

tags

participants (4)