First call to putStr takes far more time than subsequent calls

older
As reading, what are your favorite...

Ben Gunton

9 Jul 2015 9 Jul '15

8:17 p.m.

This is a simplified version of an issue I'm running into when timing some functions. In this example, the first putStr inside the replicateM_ takes far longer than subsequent putStrs. If I add in a putStr at the top (the commented line), each iteration takes the same. (Though I presume I just shifted the long execution to the putStr on top). The same thing happens on the first iterations of a function when I need to evaluate some state, or deepseq some value. putStr was just a simpler version that still showed the same issue. Relevant code: http://lpaste.net/136250 What's going on here? Thanks! import Control.Monadimport Data.Time.Clock main :: IO ()main = do -- putStr "" replicateM_ 5 $ do t1 <- getCurrentTime putStr "" t2 <- getCurrentTime print . round $ 1000 * 1000 * diffUTCTime t2 t1 {- Sample output: 49 1 1 1 2 -}

Attachments:

attachment.html (text/html — 2.1 KB)

Show replies by date

Brandon Allbery

9 Jul 9 Jul

8:24 p.m.

New subject: First call to putStr takes far more time than subsequent calls

On Thu, Jul 9, 2015 at 4:17 PM, Ben Gunton wrote:

...

In this example, the first putStr inside the replicateM_ takes far longer than subsequent putStrs. If I add in a putStr at the top (the commented line), each iteration takes the same. (Though I presume I just shifted the long execution to the putStr on top).

If I had to guess, delayed setup of the Handle associated with stdout. This will include setting up buffering (including querying the OS to find out whether output is a tty-like device or not, so as to enable or disable block buffering; note that this may be especially expensive on Windows, where there is IIRC no simple way to determine whether running in e.g. a cmd.exe window or not, as the OS has no concept of pseudo-ttys); also, setting up output encoding and such. Probably also other details. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Ben Gunton

8:26 p.m.

New subject: First call to putStr takes far more time than subsequent calls

The thing is, if I switch the "putStr """ in the replicateM to "evaluate $ force t1", the first iteration still takes way longer. Are there things to setup even with that? Is there a way to pre-setup all the things that need to be setup? This is also on Linux, with GHC 7.10.1, and no optimizations. On Thu, Jul 9, 2015 at 2:24 PM, Brandon Allbery wrote:

...

On Thu, Jul 9, 2015 at 4:17 PM, Ben Gunton wrote:

...
In this example, the first putStr inside the replicateM_ takes far longer than subsequent putStrs. If I add in a putStr at the top (the commented line), each iteration takes the same. (Though I presume I just shifted the long execution to the putStr on top).

If I had to guess, delayed setup of the Handle associated with stdout. This will include setting up buffering (including querying the OS to find out whether output is a tty-like device or not, so as to enable or disable block buffering; note that this may be especially expensive on Windows, where there is IIRC no simple way to determine whether running in e.g. a cmd.exe window or not, as the OS has no concept of pseudo-ttys); also, setting up output encoding and such. Probably also other details.

-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Ben Gunton

9:07 p.m.

New subject: First call to putStr takes far more time than subsequent calls

Here is the same thing with no console usage, just forcing the time value. The first-iteration-delay seems to happen at the first IO-ish thing being evaluated. http://lpaste.net/136254 On Thu, Jul 9, 2015 at 2:26 PM, Ben Gunton wrote:

...

The thing is, if I switch the "putStr """ in the replicateM to "evaluate $ force t1", the first iteration still takes way longer. Are there things to setup even with that? Is there a way to pre-setup all the things that need to be setup?

This is also on Linux, with GHC 7.10.1, and no optimizations.

On Thu, Jul 9, 2015 at 2:24 PM, Brandon Allbery wrote:

...
On Thu, Jul 9, 2015 at 4:17 PM, Ben Gunton wrote:

...
In this example, the first putStr inside the replicateM_ takes far longer than subsequent putStrs. If I add in a putStr at the top (the commented line), each iteration takes the same. (Though I presume I just shifted the long execution to the putStr on top).

If I had to guess, delayed setup of the Handle associated with stdout. This will include setting up buffering (including querying the OS to find out whether output is a tty-like device or not, so as to enable or disable block buffering; note that this may be especially expensive on Windows, where there is IIRC no simple way to determine whether running in e.g. a cmd.exe window or not, as the OS has no concept of pseudo-ttys); also, setting up output encoding and such. Probably also other details.

-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Roman Cheplyaka

11:45 p.m.

New subject: First call to putStr takes far more time than subsequent calls

This is a nice puzzle. However, I don't think it's related to IO. So far I've traced this to Data.Fixed methods. If you add evaluate $ floor (2/1 :: Fixed E12) before the loop, you'll find that the delay decreases significantly (although doesn't vanish entirely). How is this related to forcing a time value? Well, UTCTime represents the time of day as Fixed E12, and when you're evaluating the result of getCurrentTime, you're calling those methods, (/) and floor. Using other methods of Fixed E12 before the loop, or replacing E12 with E9, kills or at least hurts the effect. There might be some thunks referenced directly or indirectly by those methods, possibly involving specialization, but I don't see it so far. I'll let someone else to take it from here. On 10/07/15 00:07, Ben Gunton wrote:

...

Here is the same thing with no console usage, just forcing the time value. The first-iteration-delay seems to happen at the first IO-ish thing being evaluated.

http://lpaste.net/136254

Ben Gunton

10 Jul 10 Jul

1:34 a.m.

New subject: First call to putStr takes far more time than subsequent calls

When a module is imported, or used for the first time, are there things that are not fully evaluated? I see the same thing you do with the Data.Fixed methods (nice work by the way!), but I also see similar (albeit less drastic) behaviour with other datatypes, including some custom ones. It just always seems like the first call to a modules method is always slower than subsequent calls. But I can't for the life of me figure out why... On Thu, Jul 9, 2015 at 5:45 PM, Roman Cheplyaka wrote:

...

This is a nice puzzle. However, I don't think it's related to IO.

So far I've traced this to Data.Fixed methods. If you add

evaluate $ floor (2/1 :: Fixed E12)

before the loop, you'll find that the delay decreases significantly (although doesn't vanish entirely).

How is this related to forcing a time value? Well, UTCTime represents the time of day as Fixed E12, and when you're evaluating the result of getCurrentTime, you're calling those methods, (/) and floor.

Using other methods of Fixed E12 before the loop, or replacing E12 with E9, kills or at least hurts the effect.

There might be some thunks referenced directly or indirectly by those methods, possibly involving specialization, but I don't see it so far.

I'll let someone else to take it from here.

On 10/07/15 00:07, Ben Gunton wrote:

...
Here is the same thing with no console usage, just forcing the time value. The first-iteration-delay seems to happen at the first IO-ish thing being evaluated.

http://lpaste.net/136254

Roman Cheplyaka

6:23 a.m.

New subject: First call to putStr takes far more time than subsequent calls

On 10/07/15 04:34, Ben Gunton wrote:

...

When a module is imported, or used for the first time, are there things that are not fully evaluated?

Evaluation isn't tied to modules in any way. Modules are namespaces; they are strictly compile-time entities. Imagine that in your module you define at the top level fac10 :: Integer fac10 = product [1..10] This is known as CAF, a constant applicative form. If you never demand the value of fac10 in your program, it will never get evaluated. If you do demand the value of fac10, it will be evaluated the first time, and then fac10 will be modified to point to the actual value instead of a thunk, so all the subsequent calls will be much faster. Now, I suspect that there may be something like fac10 in Data.Fixed or perhaps one of its dependencies. It doesn't even have to be defined in the code: during optimization, a lot of things become CAFs when ghc finds something that looks like a constant and floats it out to the top level. Roman

Jonas Scholl

7:17 a.m.

New subject: First call to putStr takes far more time than subsequent calls

If you look at the Fixed data type, it is implemented using the Integer type, which is again implemented using libgmp on my system. The first call to any function involving an Integer type thus needs to initialize libgmp first. Using ltrace it seems like libgmp first has to determine the processor it is running on and maybe set up other bookkeeping information. It could also be the case that libgmp itself has to be loaded at this point. Using strace it looks like it is mmapped into the memory, maybe linux is lazy and only reads the file on the disk at the moment it is accessed the first time (i.e. at the first call)? At least linux uses a similar approach for requested memory by a program. On 07/10/2015 01:45 AM, Roman Cheplyaka wrote:

...

This is a nice puzzle. However, I don't think it's related to IO.

So far I've traced this to Data.Fixed methods. If you add

evaluate $ floor (2/1 :: Fixed E12)

before the loop, you'll find that the delay decreases significantly (although doesn't vanish entirely).

How is this related to forcing a time value? Well, UTCTime represents the time of day as Fixed E12, and when you're evaluating the result of getCurrentTime, you're calling those methods, (/) and floor.

Using other methods of Fixed E12 before the loop, or replacing E12 with E9, kills or at least hurts the effect.

There might be some thunks referenced directly or indirectly by those methods, possibly involving specialization, but I don't see it so far.

I'll let someone else to take it from here.

On 10/07/15 00:07, Ben Gunton wrote:

...
Here is the same thing with no console usage, just forcing the time value. The first-iteration-delay seems to happen at the first IO-ish thing being evaluated.

http://lpaste.net/136254

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Roman Cheplyaka

7:31 a.m.

New subject: First call to putStr takes far more time than subsequent calls

I considered that, but it doesn't seem to hold up. If you replace calling the Fixed functions with some operations on Integers, you won't reproduce the effect. On 10/07/15 10:17, Jonas Scholl wrote:

...

If you look at the Fixed data type, it is implemented using the Integer type, which is again implemented using libgmp on my system. The first call to any function involving an Integer type thus needs to initialize libgmp first. Using ltrace it seems like libgmp first has to determine the processor it is running on and maybe set up other bookkeeping information.

It could also be the case that libgmp itself has to be loaded at this point. Using strace it looks like it is mmapped into the memory, maybe linux is lazy and only reads the file on the disk at the moment it is accessed the first time (i.e. at the first call)? At least linux uses a similar approach for requested memory by a program.

On 07/10/2015 01:45 AM, Roman Cheplyaka wrote:

...
This is a nice puzzle. However, I don't think it's related to IO.

So far I've traced this to Data.Fixed methods. If you add

evaluate $ floor (2/1 :: Fixed E12)

before the loop, you'll find that the delay decreases significantly (although doesn't vanish entirely).

How is this related to forcing a time value? Well, UTCTime represents the time of day as Fixed E12, and when you're evaluating the result of getCurrentTime, you're calling those methods, (/) and floor.

Using other methods of Fixed E12 before the loop, or replacing E12 with E9, kills or at least hurts the effect.

There might be some thunks referenced directly or indirectly by those methods, possibly involving specialization, but I don't see it so far.

I'll let someone else to take it from here.

On 10/07/15 00:07, Ben Gunton wrote:

...
Here is the same thing with no console usage, just forcing the time value. The first-iteration-delay seems to happen at the first IO-ish thing being evaluated.

http://lpaste.net/136254

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Alberto G. Corona

8:37 a.m.

New subject: First call to putStr takes far more time than subsequent calls

the C allocation functions (malloc etc) spend significantly more time in the first invocation. That may be related with this behaviour in libgmp, IO etc 2015-07-10 9:31 GMT+02:00 Roman Cheplyaka :

...

I considered that, but it doesn't seem to hold up. If you replace calling the Fixed functions with some operations on Integers, you won't reproduce the effect.

On 10/07/15 10:17, Jonas Scholl wrote:

...
If you look at the Fixed data type, it is implemented using the Integer type, which is again implemented using libgmp on my system. The first call to any function involving an Integer type thus needs to initialize libgmp first. Using ltrace it seems like libgmp first has to determine the processor it is running on and maybe set up other bookkeeping information.

It could also be the case that libgmp itself has to be loaded at this point. Using strace it looks like it is mmapped into the memory, maybe linux is lazy and only reads the file on the disk at the moment it is accessed the first time (i.e. at the first call)? At least linux uses a similar approach for requested memory by a program.

On 07/10/2015 01:45 AM, Roman Cheplyaka wrote:

...
This is a nice puzzle. However, I don't think it's related to IO.

So far I've traced this to Data.Fixed methods. If you add

evaluate $ floor (2/1 :: Fixed E12)

before the loop, you'll find that the delay decreases significantly (although doesn't vanish entirely).

How is this related to forcing a time value? Well, UTCTime represents the time of day as Fixed E12, and when you're evaluating the result of getCurrentTime, you're calling those methods, (/) and floor.

Using other methods of Fixed E12 before the loop, or replacing E12 with E9, kills or at least hurts the effect.

There might be some thunks referenced directly or indirectly by those methods, possibly involving specialization, but I don't see it so far.

I'll let someone else to take it from here.

On 10/07/15 00:07, Ben Gunton wrote:

...
Here is the same thing with no console usage, just forcing the time value. The first-iteration-delay seems to happen at the first IO-ish thing being evaluated.

http://lpaste.net/136254

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

-- Alberto.

Alexey Shmalko

11 Jul 11 Jul

7:02 a.m.

New subject: First call to putStr takes far more time than subsequent calls

On Fri, Jul 10, 2015 at 10:17 AM, Jonas Scholl wrote:

...

It could also be the case that libgmp itself has to be loaded at this point. Using strace it looks like it is mmapped into the memory, maybe linux is lazy and only reads the file on the disk at the moment it is accessed the first time (i.e. at the first call)? At least linux uses a similar approach for requested memory by a program.

I can confirm Linux is lazy in loading program's code. It maps it in memory space, but doesn't load yet. That's also true for mmap.

3657

Age (days ago)

3659

Last active (days ago)

List overview

Download

10 comments

6 participants

participants (6)

Alberto G. Corona
Alexey Shmalko
Ben Gunton
Brandon Allbery
Jonas Scholl
Roman Cheplyaka