Re: [web-devel] Caching the System time

newer
[Persistent] PersistBackend and...

Aristid Breitkreuz

8 Aug 2011 8 Aug '11

10:04 a.m.

Did you measure how long it takes to generate that datetime string? Shouldn't that be possible in a hundred nanoseconds or so? Am 08.08.2011 15:01 schrieb "Gregory Collins" :

Attachments:

attachment.html (text/html — 326 bytes)

Show replies by date

Greg Weber

8 Aug 8 Aug

10:16 a.m.

New subject: Caching the System time

Kazu profiled his code and found that using the current haskell date/time libraries was extremely expensive. He solved the issues by writing optimized date code. It is unclear how much benefit there would be to caching after switching to his optimized date code. He is planning to release his logging code as a separate package. Here is a draft of an article he is writing: http://www.mew.org/~kazu/mighttpd.pdf On Mon, Aug 8, 2011 at 7:04 AM, Aristid Breitkreuz wrote:

...

Did you measure how long it takes to generate that datetime string? Shouldn't that be possible in a hundred nanoseconds or so? Am 08.08.2011 15:01 schrieb "Gregory Collins" :

Gregory Collins

2:29 p.m.

New subject: Caching the System time

On Mon, Aug 8, 2011 at 4:16 PM, Greg Weber wrote:

...

Kazu profiled his code and found that using the current haskell date/time libraries was extremely expensive. He solved the issues by writing optimized date code. It is unclear how much benefit there would be to caching after switching to his optimized date code. He is planning to release his logging code as a separate package.

In Snap we call out to C. It could be that Kazu's code is faster (it can probably be inlined to avoid a function call), but we don't call the time code enough to make it worth checking. G -- Gregory Collins

Kazu Yamamoto

9 Aug 9 Aug

11:17 p.m.

New subject: Caching the System time

Hello web-devel! Sorry for the delay. I was on vacation.

...

Kazu profiled his code and found that using the current haskell date/time libraries was extremely expensive. He solved the issues by writing optimized date code. It is unclear how much benefit there would be to caching after switching to his optimized date code. He is planning to release his logging code as a separate package.

Let me explain what I have done in Mighttpd. 1) The http-date library. This is for HTTP date-related fields (including Last-Modified:), not *logging*. To my profile, Data.Time is really slow. This library is based on the fact that HTTP date-related fields is *GMT only* and the field value is exactly 29 bytes. 2) Logging For the Apache style logging, we need *zoned* time. I did not want to implement it from scratch. So, Mighttpd issues getZonedTime every second, formats it with Data.Time, and caches ByteString representation. Yes, both getZonedTime and Data.Time are really slow, but Mighttpd uses them just once for every second. The caches is stored in IORef with atomicModifyIORef which is faster than MVar. Every user-thread of Mighttpd records a log message with Handle. Since Handle is protected with MVar, serialization is done. Multiple log messages are buffered in the buffer of Handle to reduce the number of the write system call. Moreover, I re-implemented hPut to reduce unnecessary intermediate data to compose a log message. I will try to strip this code from Mighttpd and make it as a WAI application(?) today. --Kazu

Max Cantor

10 Aug 10 Aug

12:20 a.m.

New subject: Caching the System time

Just curious, but might it be faster to: replace the MVar handle with an IORef [Data.Sequence.Seq String] logs are appended with atomicModifyIORef then every n seconds, a worker thread uses atomicModifyIORef to clear the sequence and write the log messages to a handle? this way, logging a message doesn't require taking an mvar I have no idea if this is better, just thought i'd throw it out. On Aug 10, 2011, at 11:17 AM, Kazu Yamamoto (山本和彦) wrote:

...

Every user-thread of Mighttpd records a log message with Handle. Since Handle is protected with MVar, serialization is done. Multiple log messages are buffered in the buffer of Handle to reduce the number of the write system call. Moreover, I re-implemented hPut to reduce unnecessary intermediate data to compose a log message.

Michael Snoyman

12:30 a.m.

New subject: Caching the System time

Did you mean IORef (Data.Sequence.Seq String)? I'm wondering if a Chan might be even better than that. On Wed, Aug 10, 2011 at 7:20 AM, Max Cantor wrote:

...

Just curious, but might it be faster to:

replace the MVar handle with an IORef [Data.Sequence.Seq String] logs are appended with atomicModifyIORef then every n seconds, a worker thread uses atomicModifyIORef to clear the sequence and write the log messages to a handle?

this way, logging a message doesn't require taking an mvar

I have no idea if this is better, just thought i'd throw it out.

On Aug 10, 2011, at 11:17 AM, Kazu Yamamoto (山本和彦) wrote:

...
Every user-thread of Mighttpd records a log message with Handle. Since Handle is protected with MVar, serialization is done. Multiple log messages are buffered in the buffer of Handle to reduce the number of the write system call. Moreover, I re-implemented hPut to reduce unnecessary intermediate data to compose a log message.

_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Kazu Yamamoto

1:08 a.m.

New subject: Caching the System time

...

Did you mean IORef (Data.Sequence.Seq String)? I'm wondering if a Chan might be even better than that.

Chan is based on MVar which is slow... --Kazu

Max Cantor

1:12 p.m.

New subject: Caching the System time

thats exactly what I meant On Aug 10, 2011, at 12:30 PM, Michael Snoyman wrote:

...

Did you mean IORef (Data.Sequence.Seq String)? I'm wondering if a Chan might be even better than that.

On Wed, Aug 10, 2011 at 7:20 AM, Max Cantor wrote:

...
Just curious, but might it be faster to:

replace the MVar handle with an IORef [Data.Sequence.Seq String] logs are appended with atomicModifyIORef then every n seconds, a worker thread uses atomicModifyIORef to clear the sequence and write the log messages to a handle?

this way, logging a message doesn't require taking an mvar

I have no idea if this is better, just thought i'd throw it out.

On Aug 10, 2011, at 11:17 AM, Kazu Yamamoto (山本和彦) wrote:

...
Every user-thread of Mighttpd records a log message with Handle. Since Handle is protected with MVar, serialization is done. Multiple log messages are buffered in the buffer of Handle to reduce the number of the write system call. Moreover, I re-implemented hPut to reduce unnecessary intermediate data to compose a log message.

_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Kazu Yamamoto

1:07 a.m.

New subject: Caching the System time

Hello,

...

replace the MVar handle with an IORef [Data.Sequence.Seq String] logs are appended with atomicModifyIORef then every n seconds, a worker thread uses atomicModifyIORef to clear the sequence and write the log messages to a handle?

this way, logging a message doesn't require taking an mvar

It would be faster. But we need to implement a special flush system for process termination. I will try to implement if time allows. --Kazu

Gregory Collins

2:16 a.m.

New subject: Caching the System time

On Wed, Aug 10, 2011 at 6:20 AM, Max Cantor wrote:

...

Just curious, but might it be faster to:

replace the MVar handle with an IORef [Data.Sequence.Seq String] logs are appended with atomicModifyIORef then every n seconds, a worker thread uses atomicModifyIORef to clear the sequence and write the log messages to a handle?

this way, logging a message doesn't require taking an mvar

I have no idea if this is better, just thought i'd throw it out.

That's exactly what snap-server does in its logging code. G -- Gregory Collins

Simon Meier

18 Aug 18 Aug

8:38 a.m.

New subject: Caching the System time

Hi Max, if you're going for such a solution, then why not use difference lists or even better bytestring builders as your representation of the not-yet-flushed logging journal? Bytestring builders (from the blaze-builder library) support a fast append and fast serialization from a number of formats to bytestring. best regards, Simon 2011/8/10 Max Cantor :

...

Just curious, but might it be faster to:

replace the MVar handle with an IORef [Data.Sequence.Seq String] logs are appended with atomicModifyIORef then every n seconds, a worker thread uses atomicModifyIORef to clear the sequence and write the log messages to a handle?

this way, logging a message doesn't require taking an mvar

I have no idea if this is better, just thought i'd throw it out.

On Aug 10, 2011, at 11:17 AM, Kazu Yamamoto (山本和彦) wrote:

...
Every user-thread of Mighttpd records a log message with Handle. Since Handle is protected with MVar, serialization is done. Multiple log messages are buffered in the buffer of Handle to reduce the number of the write system call. Moreover, I re-implemented hPut to reduce unnecessary intermediate data to compose a log message.

_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Kazu Yamamoto

9:38 p.m.

New subject: Caching the System time

Hello Simon,

...

if you're going for such a solution, then why not use difference lists or even better bytestring builders as your representation of the not-yet-flushed logging journal? Bytestring builders (from the blaze-builder library) support a fast append and fast serialization from a number of formats to bytestring.

Difference lists are not necessary at this moment because a list is generated at once in the apacheFormat: http://hackage.haskell.org/packages/archive/wai-logger/0.0.0/doc/html/src/Ne... If my understanding is correct, blaze-builder does not help hPutLogStr. What I want to do is directly copy ByteString or List to *Handle's buffer*. http://hackage.haskell.org/packages/archive/wai-logger/0.0.0/doc/html/src/Ne... --Kazu

Simon Meier

19 Aug 19 Aug

8:13 a.m.

New subject: Caching the System time

Hi Kazu, 2011/8/19 Kazu Yamamoto :

...

Hello Simon,

...
if you're going for such a solution, then why not use difference lists or even better bytestring builders as your representation of the not-yet-flushed logging journal? Bytestring builders (from the blaze-builder library) support a fast append and fast serialization from a number of formats to bytestring.

Difference lists are not necessary at this moment because a list is generated at once in the apacheFormat:

http://hackage.haskell.org/packages/archive/wai-logger/0.0.0/doc/html/src/Ne...

If my understanding is correct, blaze-builder does not help hPutLogStr. What I want to do is directly copy ByteString or List to *Handle's buffer*.

http://hackage.haskell.org/packages/archive/wai-logger/0.0.0/doc/html/src/Ne...

I see. Currently, builders cannot be executed directly on a Handle's buffer. This is functionality I wanted to have for a long time, but have not gotten around to implement it. Using bytestring builders you could avoid creating the intermediate [LogStr] list. You should get a performance benefit, when describing your log-message directly as a mapping to a builder and executing this builder on the handle's buffer, as this avoids the indirections from the list- and the LogStr-cells. Copying the byteStrings directly also works for builders using the 'copyByteString' function. You would get a further performance benefit, if you could avoid creating intermediate String values. For example, the new builder in the bytestring library provides functions for the decimal encoding of numbers directly into the output buffer using a fast C-implementation. The development repository of the new bytestring builders is available here [1]. Its API is finished, benchmarks look good, and a documentation draft exists. Hence, it would be cheap to give it a go and see how fast you could produce the log-messages using the new bytestring builders. I'd use criterion to compare mapM_ yourLogMessageWriter logMessageList against whnf (L.length . toLazyByteString . mconcat . map builderLogMessageWriter) logMessageList where logMessageList :: [(ZonedDate, Request, Status, Maybe Integer)] logMessageList = replicate 10000 ( your-message-params) This should be a fair comparison, as both implementations work on similarly large buffers. If that shows that builders are beneficial, then we can think about implementing output on a Handle's buffer directly. best regards, Simon PS: The new bytestring builder will very likely be released with the next GHC in November. [1] https://github.com/meiersi/bytestring

Kazu Yamamoto

22 Aug 22 Aug

2:44 a.m.

New subject: Caching the System time

Simon, Thank you for your explanation.

...

For example, the new builder in the bytestring library provides functions for the decimal encoding of numbers directly into the output buffer using a fast C-implementation.

Do you refer to hPutBuilder in Data.ByteString.Lazy.Builder? If so, I have a question. For logging systems, a line of a log message must be guaranteed. Log messages must not be mangled in a log file. It seems to me hPutBuilder can grantee lines if Handle's buffering is the line buffering. However, it seems to me that it cannot grantee lines if Handle's buffering is the block buffering. Is my understanding correct? To implement *multi-line* buffering, we need to know the length of a log message to be written in advance. If it is larger than the size of the free space of Handle's buffer, the buffer must be flushed first. I'm not sure that we can obtain the length of Builder without building(copying). --Kazu

Piyush P Kurur

3:12 a.m.

New subject: Caching the System time

On Mon, Aug 22, 2011 at 03:44:02PM +0900, Kazu Yamamoto wrote:

...

Simon,

Thank you for your explanation.

...

I'm not sure that we can obtain the length of Builder without building(copying).

I too am interested in knowing whether the length of a builder (exact length) can be computed without forcing computation. The context that I need this is different (an asn1 library). The way I solved is to keep track of the length explicitly. But if builder can do that for me that is very good. Regards ppk

Simon Meier

7 a.m.

New subject: Caching the System time

Hi Kazu, Piyush, the internal interface of builders allows you to check the free buffer space and flush early, if required. This works as long as you have some external means to compute an upper bound on the number of bytes that might be written. In fact, the 'Write' type in blaze-builder (in the new bytestring builder better described as bounded-length encodings [2]) abstracts exactly this situation. For log-files, you could create the log-line as a 'Write' value and execute it afterwards using 'fromWrite'. This guarantees the line-flush behaviour even for block-buffering. If you cannot compute an a-priori bound on the number of bytes written, there is no way around forcing the value. If you use the public interface of builders, this results in a number of lazy bytestrings, which have to be concatenated afterwards with the appropriate flushes in between. Depending on your application you may be able to do it all in one pass using a builder-transformer that intercepts the internal protocol between the buildsteps and the buildstep-driver that provides the free buffers. For example, the HTTP chunked-transfer-encoding is implemented as a builder-transformer [2]. Before running the inner buildstep, the builder-transformer reserves enough space to write the maximal length of the output. Then, it starts the inner buildstep. Once the inner buildstep returns, the correct size is written at the reserved space, padding the hex-representation with zeroes, if necessary. Would someting like this work in your use-case @Piyush? best regards, Simon [1] https://github.com/meiersi/bytestring/blob/master/Data/ByteString/Lazy/Build... [2] https://github.com/meiersi/blaze-builder/blob/master/Blaze/ByteString/Builde... 2011/8/22 Piyush P Kurur :

...

On Mon, Aug 22, 2011 at 03:44:02PM +0900, Kazu Yamamoto wrote:

...
Simon,

Thank you for your explanation.

...
I'm not sure that we can obtain the length of Builder without building(copying).

I too am interested in knowing whether the length of a builder (exact length) can be computed without forcing computation. The context that I need this is different (an asn1 library). The way I solved is to keep track of the length explicitly. But if builder can do that for me that is very good.

Regards

ppk

_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

5020

Age (days ago)

5034

Last active (days ago)

List overview

Download

15 comments

8 participants

participants (8)

Aristid Breitkreuz
Greg Weber
Gregory Collins
Kazu Yamamoto
Max Cantor
Michael Snoyman
Piyush P Kurur
Simon Meier

Re: [web-devel] Caching the System time

tags

participants (8)