
Did you measure how long it takes to generate that datetime string?
Shouldn't that be possible in a hundred nanoseconds or so?
Am 08.08.2011 15:01 schrieb "Gregory Collins"

Kazu profiled his code and found that using the current haskell date/time
libraries was extremely expensive. He solved the issues by writing optimized
date code. It is unclear how much benefit there would be to caching after
switching to his optimized date code. He is planning to release his logging
code as a separate package.
Here is a draft of an article he is writing:
http://www.mew.org/~kazu/mighttpd.pdf
On Mon, Aug 8, 2011 at 7:04 AM, Aristid Breitkreuz
Did you measure how long it takes to generate that datetime string? Shouldn't that be possible in a hundred nanoseconds or so? Am 08.08.2011 15:01 schrieb "Gregory Collins"
:

On Mon, Aug 8, 2011 at 4:16 PM, Greg Weber
Kazu profiled his code and found that using the current haskell date/time libraries was extremely expensive. He solved the issues by writing optimized date code. It is unclear how much benefit there would be to caching after switching to his optimized date code. He is planning to release his logging code as a separate package.
In Snap we call out to C. It could be that Kazu's code is faster (it can
probably be inlined to avoid a function call), but we don't call the time
code enough to make it worth checking.
G
--
Gregory Collins

Hello web-devel! Sorry for the delay. I was on vacation.
Kazu profiled his code and found that using the current haskell date/time libraries was extremely expensive. He solved the issues by writing optimized date code. It is unclear how much benefit there would be to caching after switching to his optimized date code. He is planning to release his logging code as a separate package.
Let me explain what I have done in Mighttpd. 1) The http-date library. This is for HTTP date-related fields (including Last-Modified:), not *logging*. To my profile, Data.Time is really slow. This library is based on the fact that HTTP date-related fields is *GMT only* and the field value is exactly 29 bytes. 2) Logging For the Apache style logging, we need *zoned* time. I did not want to implement it from scratch. So, Mighttpd issues getZonedTime every second, formats it with Data.Time, and caches ByteString representation. Yes, both getZonedTime and Data.Time are really slow, but Mighttpd uses them just once for every second. The caches is stored in IORef with atomicModifyIORef which is faster than MVar. Every user-thread of Mighttpd records a log message with Handle. Since Handle is protected with MVar, serialization is done. Multiple log messages are buffered in the buffer of Handle to reduce the number of the write system call. Moreover, I re-implemented hPut to reduce unnecessary intermediate data to compose a log message. I will try to strip this code from Mighttpd and make it as a WAI application(?) today. --Kazu

Just curious, but might it be faster to: replace the MVar handle with an IORef [Data.Sequence.Seq String] logs are appended with atomicModifyIORef then every n seconds, a worker thread uses atomicModifyIORef to clear the sequence and write the log messages to a handle? this way, logging a message doesn't require taking an mvar I have no idea if this is better, just thought i'd throw it out. On Aug 10, 2011, at 11:17 AM, Kazu Yamamoto (山本和彦) wrote:
Every user-thread of Mighttpd records a log message with Handle. Since Handle is protected with MVar, serialization is done. Multiple log messages are buffered in the buffer of Handle to reduce the number of the write system call. Moreover, I re-implemented hPut to reduce unnecessary intermediate data to compose a log message.

Did you mean IORef (Data.Sequence.Seq String)? I'm wondering if a Chan
might be even better than that.
On Wed, Aug 10, 2011 at 7:20 AM, Max Cantor
Just curious, but might it be faster to:
replace the MVar handle with an IORef [Data.Sequence.Seq String] logs are appended with atomicModifyIORef then every n seconds, a worker thread uses atomicModifyIORef to clear the sequence and write the log messages to a handle?
this way, logging a message doesn't require taking an mvar
I have no idea if this is better, just thought i'd throw it out.
On Aug 10, 2011, at 11:17 AM, Kazu Yamamoto (山本和彦) wrote:
Every user-thread of Mighttpd records a log message with Handle. Since Handle is protected with MVar, serialization is done. Multiple log messages are buffered in the buffer of Handle to reduce the number of the write system call. Moreover, I re-implemented hPut to reduce unnecessary intermediate data to compose a log message.
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

thats exactly what I meant On Aug 10, 2011, at 12:30 PM, Michael Snoyman wrote:
Did you mean IORef (Data.Sequence.Seq String)? I'm wondering if a Chan might be even better than that.
On Wed, Aug 10, 2011 at 7:20 AM, Max Cantor
wrote: Just curious, but might it be faster to:
replace the MVar handle with an IORef [Data.Sequence.Seq String] logs are appended with atomicModifyIORef then every n seconds, a worker thread uses atomicModifyIORef to clear the sequence and write the log messages to a handle?
this way, logging a message doesn't require taking an mvar
I have no idea if this is better, just thought i'd throw it out.
On Aug 10, 2011, at 11:17 AM, Kazu Yamamoto (山本和彦) wrote:
Every user-thread of Mighttpd records a log message with Handle. Since Handle is protected with MVar, serialization is done. Multiple log messages are buffered in the buffer of Handle to reduce the number of the write system call. Moreover, I re-implemented hPut to reduce unnecessary intermediate data to compose a log message.
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Hello,
replace the MVar handle with an IORef [Data.Sequence.Seq String] logs are appended with atomicModifyIORef then every n seconds, a worker thread uses atomicModifyIORef to clear the sequence and write the log messages to a handle?
this way, logging a message doesn't require taking an mvar
It would be faster. But we need to implement a special flush system for process termination. I will try to implement if time allows. --Kazu

On Wed, Aug 10, 2011 at 6:20 AM, Max Cantor
Just curious, but might it be faster to:
replace the MVar handle with an IORef [Data.Sequence.Seq String] logs are appended with atomicModifyIORef then every n seconds, a worker thread uses atomicModifyIORef to clear the sequence and write the log messages to a handle?
this way, logging a message doesn't require taking an mvar
I have no idea if this is better, just thought i'd throw it out.
That's exactly what snap-server does in its logging code.
G
--
Gregory Collins

Hi Max,
if you're going for such a solution, then why not use difference lists
or even better bytestring builders as your representation of the
not-yet-flushed logging journal? Bytestring builders (from the
blaze-builder library) support a fast append and fast serialization
from a number of formats to bytestring.
best regards,
Simon
2011/8/10 Max Cantor
Just curious, but might it be faster to:
replace the MVar handle with an IORef [Data.Sequence.Seq String] logs are appended with atomicModifyIORef then every n seconds, a worker thread uses atomicModifyIORef to clear the sequence and write the log messages to a handle?
this way, logging a message doesn't require taking an mvar
I have no idea if this is better, just thought i'd throw it out.
On Aug 10, 2011, at 11:17 AM, Kazu Yamamoto (山本和彦) wrote:
Every user-thread of Mighttpd records a log message with Handle. Since Handle is protected with MVar, serialization is done. Multiple log messages are buffered in the buffer of Handle to reduce the number of the write system call. Moreover, I re-implemented hPut to reduce unnecessary intermediate data to compose a log message.
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

Hello Simon,
if you're going for such a solution, then why not use difference lists or even better bytestring builders as your representation of the not-yet-flushed logging journal? Bytestring builders (from the blaze-builder library) support a fast append and fast serialization from a number of formats to bytestring.
Difference lists are not necessary at this moment because a list is generated at once in the apacheFormat: http://hackage.haskell.org/packages/archive/wai-logger/0.0.0/doc/html/src/Ne... If my understanding is correct, blaze-builder does not help hPutLogStr. What I want to do is directly copy ByteString or List to *Handle's buffer*. http://hackage.haskell.org/packages/archive/wai-logger/0.0.0/doc/html/src/Ne... --Kazu

Hi Kazu,
2011/8/19 Kazu Yamamoto
Hello Simon,
if you're going for such a solution, then why not use difference lists or even better bytestring builders as your representation of the not-yet-flushed logging journal? Bytestring builders (from the blaze-builder library) support a fast append and fast serialization from a number of formats to bytestring.
Difference lists are not necessary at this moment because a list is generated at once in the apacheFormat:
http://hackage.haskell.org/packages/archive/wai-logger/0.0.0/doc/html/src/Ne...
If my understanding is correct, blaze-builder does not help hPutLogStr. What I want to do is directly copy ByteString or List to *Handle's buffer*.
http://hackage.haskell.org/packages/archive/wai-logger/0.0.0/doc/html/src/Ne...
I see. Currently, builders cannot be executed directly on a Handle's buffer. This is functionality I wanted to have for a long time, but have not gotten around to implement it. Using bytestring builders you could avoid creating the intermediate [LogStr] list. You should get a performance benefit, when describing your log-message directly as a mapping to a builder and executing this builder on the handle's buffer, as this avoids the indirections from the list- and the LogStr-cells. Copying the byteStrings directly also works for builders using the 'copyByteString' function. You would get a further performance benefit, if you could avoid creating intermediate String values. For example, the new builder in the bytestring library provides functions for the decimal encoding of numbers directly into the output buffer using a fast C-implementation. The development repository of the new bytestring builders is available here [1]. Its API is finished, benchmarks look good, and a documentation draft exists. Hence, it would be cheap to give it a go and see how fast you could produce the log-messages using the new bytestring builders. I'd use criterion to compare mapM_ yourLogMessageWriter logMessageList against whnf (L.length . toLazyByteString . mconcat . map builderLogMessageWriter) logMessageList where logMessageList :: [(ZonedDate, Request, Status, Maybe Integer)] logMessageList = replicate 10000 ( your-message-params) This should be a fair comparison, as both implementations work on similarly large buffers. If that shows that builders are beneficial, then we can think about implementing output on a Handle's buffer directly. best regards, Simon PS: The new bytestring builder will very likely be released with the next GHC in November. [1] https://github.com/meiersi/bytestring

Simon, Thank you for your explanation.
For example, the new builder in the bytestring library provides functions for the decimal encoding of numbers directly into the output buffer using a fast C-implementation.
Do you refer to hPutBuilder in Data.ByteString.Lazy.Builder? If so, I have a question. For logging systems, a line of a log message must be guaranteed. Log messages must not be mangled in a log file. It seems to me hPutBuilder can grantee lines if Handle's buffering is the line buffering. However, it seems to me that it cannot grantee lines if Handle's buffering is the block buffering. Is my understanding correct? To implement *multi-line* buffering, we need to know the length of a log message to be written in advance. If it is larger than the size of the free space of Handle's buffer, the buffer must be flushed first. I'm not sure that we can obtain the length of Builder without building(copying). --Kazu

On Mon, Aug 22, 2011 at 03:44:02PM +0900, Kazu Yamamoto wrote:
Simon,
Thank you for your explanation.
I'm not sure that we can obtain the length of Builder without building(copying).
I too am interested in knowing whether the length of a builder (exact length) can be computed without forcing computation. The context that I need this is different (an asn1 library). The way I solved is to keep track of the length explicitly. But if builder can do that for me that is very good. Regards ppk

Hi Kazu, Piyush,
the internal interface of builders allows you to check the free buffer
space and flush early, if required. This works as long as you have
some external means to compute an upper bound on the number of bytes
that might be written. In fact, the 'Write' type in blaze-builder (in
the new bytestring builder better described as bounded-length
encodings [2]) abstracts exactly this situation. For log-files, you
could create the log-line as a 'Write' value and execute it afterwards
using 'fromWrite'. This guarantees the line-flush behaviour even for
block-buffering.
If you cannot compute an a-priori bound on the number of bytes
written, there is no way around forcing the value. If you use the
public interface of builders, this results in a number of lazy
bytestrings, which have to be concatenated afterwards with the
appropriate flushes in between. Depending on your application you may
be able to do it all in one pass using a builder-transformer that
intercepts the internal protocol between the buildsteps and the
buildstep-driver that provides the free buffers.
For example, the HTTP chunked-transfer-encoding is implemented as a
builder-transformer [2]. Before running the inner buildstep, the
builder-transformer reserves enough space to write the maximal length
of the output. Then, it starts the inner buildstep. Once the inner
buildstep returns, the correct size is written at the reserved space,
padding the hex-representation with zeroes, if necessary. Would
someting like this work in your use-case @Piyush?
best regards,
Simon
[1] https://github.com/meiersi/bytestring/blob/master/Data/ByteString/Lazy/Build...
[2] https://github.com/meiersi/blaze-builder/blob/master/Blaze/ByteString/Builde...
2011/8/22 Piyush P Kurur
On Mon, Aug 22, 2011 at 03:44:02PM +0900, Kazu Yamamoto wrote:
Simon,
Thank you for your explanation.
I'm not sure that we can obtain the length of Builder without building(copying).
I too am interested in knowing whether the length of a builder (exact length) can be computed without forcing computation. The context that I need this is different (an asn1 library). The way I solved is to keep track of the length explicitly. But if builder can do that for me that is very good.
Regards
ppk
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel
participants (8)
-
Aristid Breitkreuz
-
Greg Weber
-
Gregory Collins
-
Kazu Yamamoto
-
Max Cantor
-
Michael Snoyman
-
Piyush P Kurur
-
Simon Meier