Re: Better calling conventions for strict functions (bang patterns)?

25 Oct 2015

      Doesn't modern hardware have pretty good branch prediction? In which case
the order of the branches may not matter unless it's a long chain of calls?
Vs say an inner loop that hasn't been inlined?

Either way, I'd love be stay in the loop on this topic, for work I'm
building a strongly normalizing language that supports both strict and call
by need evaluation strategies.

On Friday, October 23, 2015, Ryan Newton <rrnewton@gmail.com> wrote:
...
...
1. Small tweaks: The CMM code above seems to be *betting* than the
   thunk is unevaluated, because it does the stack check and stack write
   *before* the predicate test that checks if the thunk is evaluated (if
   (R1 & 7 != 0) goto c3aO; else goto c3aP;).  With a bang-pattern
   function, couldn't it make the opposite bet?  That is, branch on whether
   the thunk is evaluated first, and then the wasted computation is only a
   single correctly predicted branch (and a read of a tag that we need to read
   anyway).
Oh, a small further addition would be needed for this tweak.  In the
generated code above "Sp = Sp + 8;" happens *late*, but I think it could
happen right after the call to the thunk.  In general, does it seem
feasible to separate the slowpath from fastpath as in the following tweak
of the example CMM?
*  // Skip to the chase if it's already evaluated:*
*  start:*
*      if (R2 & 7 != 0) goto fastpath; else goto slowpath;*
*  slowpath:   // Formerly c3aY*
*      if ((Sp + -8) < SpLim) goto c3aZ; else goto c3b0;*
*  c3aZ:*
*      // nop*
*      R1 = PicBaseReg + foo_closure;*
*      call (I64[BaseReg - 8])(R2, R1) args: 8, res: 0, upd: 8;*
*  c3b0:*
*      I64[Sp - 8] = PicBaseReg + block_c3aO_info;*
*      R1 = R2;*
*      Sp = Sp - 8;*
*      call (I64[R1])(R1) returns to fastpath, args: 8, res: 8, upd: 8;*
*      // Sp bump moved to here so it's separate from "fastpath"*
*      Sp = Sp + 8;*
*  fastpath: // Formerly c3aO*
*      if (R1 & 7 >= 2) goto c3aW; else goto c3aX;*
*  c3aW:*
*      R1 = P64[R1 + 6] & (-8);*
*      call (I64[R1])(R1) args: 8, res: 0, upd: 8;*
*  c3aX:*
*      R1 = PicBaseReg + lvl_r39S_closure;*
*      call (I64[R1])(R1) args: 8, res: 0, upd: 8;*

Re: Better calling conventions for strict functions (bang patterns)?

Carter Schonwald