New subject: Performance: Faster to define a function writing out all arguments?

7 May 2008

      Hi there,

I am trying to write my first serious Haskell program, but have problems 
understanding 'strange' performance results. It seems to be a ghc 
specific question, so I am asking here.

In a happy parser I have this code 1):

%monad { Parsed } { thenP } { returnP }

type Parsed = State Env.Env

returnP :: a -> Parsed a
returnP a = return a

thenP :: Parsed a -> (a -> Parsed b) -> Parsed b
thenP x k = x >>= k

An alternative was this 2) (yes, redundant in happy):

returnP :: a -> Parsed a
returnP = return

thenP :: Parsed a -> (a -> Parsed b) -> Parsed b
thenP = (>>=)

And strangely enough on my machine 1) is faster by a few percent than 2) 
using ghc 6.8.2 with -O2. A few percent might seem unimportant, but I am 
currently developing my Haskell style. And I want to write efficient 
code by default, if that doesn't lead to obfuscation.

Now, first I dislike using option 1) because it seems pointlessly 
verbose. Second, I have no clue why option 2) should be faster.

Using ghc -O2 -C I get two code blocks with 1) (appended at the end of 
this mail), whereas I get nothing with 2.
Unfortunately, that doesn't help me as I have no clue what it means.

There are also various strictness annotations throughout the code to 
make parsing less memory hungry (and getting an intuition about what is 
appropriate here drives me crazy). Random guess: perhaps somehow 
strictness analysis of ghc benefits from the code in 1)?

Or does ghc create specialized versions of return and >>= for State 
instantiated to Parsed, i.e. State Env.Env?

Anyhow, I am puzzled and grateful for any explanations and 
recommendations about what is going on and when to avoid style 2).

Thanks,
	Alexander

C-- code blocks:

-------------------------------------------------------
EI_(DarwinziTptpziParser_a106_info);
StgWord DarwinziTptpziParser_a106_closure[] = {
(W_)&DarwinziTptpziParser_a106_info
};

static StgWord s8Ey_info[] = {
0x42U, 0x22U
};

IF_(s8Ey_ret) {
W_ _c8EL;
FB_
Sp[2] = *((P_)(R1.w+7));
_c8EL = *((P_)(R1.w+3));
R1.w = Sp[1];
Sp[1] = _c8EL;
Sp=Sp+1;
JMP_((W_)&stg_ap_pp_fast);
FE_
}

StgWord DarwinziTptpziParser_a106_info[] = {
0x30014U, 0x0, 0xfU
};

EI_(DarwinziTptpziParser_a106_closure);
II_(s8Ey_info);
FN_(DarwinziTptpziParser_a106_entry) {
FB_
if ((W_)(((W_)Sp - 0x4U) < (W_)SpLim)) goto _c8EO;
R1.w = *Sp;
Sp[-1] = Sp[2];
*Sp = (W_)&s8Ey_info;
Sp=Sp-1;
JMP_((W_)&stg_ap_p_fast);
_c8EO:
R1.w = (W_)&DarwinziTptpziParser_a106_closure;
JMP_(stg_gc_fun);
FE_
}
-------------------------------------------------------

-------------------------------------------------------
II_(r6AT_info);
static StgWord r6AT_closure[] = {
(W_)&r6AT_info, 0x0, 0x0, 0x0
};

static char c97W_str[] = "Internal Happy error\012";

static StgWord r6AT_info[] = {
0x0, 0x16U
};

EI_(base_GHCziBase_unpackCStringzh_info);
IF_(r6AT_entry) {
FB_
if ((W_)(((W_)Sp - 0xcU) < (W_)SpLim)) goto _c97Z;
Hp=Hp+2;
if ((W_)((W_)Hp > (W_)HpLim)) goto _c97Z;
Hp[-1] = (W_)&stg_CAF_BLACKHOLE_info;
newCAF((void *)R1.w);
R1.p[1] = (W_)Hp-4;
*R1.p = (W_)&stg_IND_STATIC_info;
Sp[-2] = (W_)&stg_upd_frame_info;
Sp[-1] = (W_)Hp-4;
Sp[-3] = (W_)&c97W_str;
Sp=Sp-3;
JMP_((W_)&base_GHCziBase_unpackCStringzh_info);
_c97Z:
HpAlloc = 0x8U;
JMP_(stg_gc_enter_1);
FE_
}
EI_(base_GHCziErr_error_closure);
II_(r6AT_closure);
StgWord DarwinziTptpziParser_notHappyAtAll_srt[] = {
(W_)&base_GHCziErr_error_closure, (W_)&r6AT_closure
};

EI_(DarwinziTptpziParser_notHappyAtAll_info);
StgWord DarwinziTptpziParser_notHappyAtAll_closure[] = {
(W_)&DarwinziTptpziParser_notHappyAtAll_info, 0x0, 0x0, 0x0
};

StgWord DarwinziTptpziParser_notHappyAtAll_info[] = {
((W_)&DarwinziTptpziParser_notHappyAtAll_srt+0), 0x0, 0x30016U
};

EI_(base_GHCziErr_error_closure);
II_(r6AT_closure);
FN_(DarwinziTptpziParser_notHappyAtAll_entry) {
FB_
if ((W_)(((W_)Sp - 0xcU) < (W_)SpLim)) goto _c989;
Hp=Hp+2;
if ((W_)((W_)Hp > (W_)HpLim)) goto _c989;
Hp[-1] = (W_)&stg_CAF_BLACKHOLE_info;
newCAF((void *)R1.w);
R1.p[1] = (W_)Hp-4;
*R1.p = (W_)&stg_IND_STATIC_info;
Sp[-2] = (W_)&stg_upd_frame_info;
Sp[-1] = (W_)Hp-4;
R1.w = (W_)&base_GHCziErr_error_closure;
Sp[-3] = (W_)&r6AT_closure;
Sp=Sp-3;
JMP_((W_)&stg_ap_p_fast);
_c989:
HpAlloc = 0x8U;
JMP_(stg_gc_enter_1);
FE_
}
-------------------------------------------------------

Performance: Faster to define a function writing out all arguments?

Alexander Fuchs

Neil Mitchell

Don Stewart

Alexander Fuchs

Don Stewart

Simon Peyton-Jones

Alexander Fuchs

tags

participants (4)