Text.Regex.Posix regcomp

Hello, I have a server application that I am building using GHC 6.4 (yes, an update to 6.4.2 is on the horizon, but not in the immediate future - unless it fixes this problem :) ) under MSYS/MinGW. Things work pretty well, but under stress testing the app eventually throws a Windows memory access violation (c0000005) exception out of the exact same location in the regcomp function every time. When it occurs is not deterministic - it could be after 20 requests or 2000 (usually higher, though). Various Windows debuggers shows it as a memory read error. regcomp is called several times with each incoming request. I'm pretty sure it's not a multi-threading issue, as there is only one client sending requests over and over again. Can anyone suggest some ideas or debugging tips? TIA, Rich

Rich Fought wrote:
Hello,
I have a server application that I am building using GHC 6.4 (yes, an update to 6.4.2 is on the horizon, but not in the immediate future - unless it fixes this problem :) ) under MSYS/MinGW. Things work pretty well, but under stress testing the app eventually throws a Windows memory access violation (c0000005) exception out of the exact same location in the regcomp function every time. When it occurs is not deterministic - it could be after 20 requests or 2000 (usually higher, though). Various Windows debuggers shows it as a memory read error.
regcomp is called several times with each incoming request. I'm pretty sure it's not a multi-threading issue, as there is only one client sending requests over and over again.
Can anyone suggest some ideas or debugging tips?
TIA, Rich
Your code is using the provided Text.Regex code. To avoid the regcomp call, you could use darcs to get my replacement, Text.Regex.Lazy: darcs get http://evenmere.org/~chrisk/trl/stable/ or darcs get http://evenmere.org/~chrisk/trl/head/ (They have the same code right now. The tarball at sourceforge, http://sourceforge.net/projects/lazy-regex ,is missing a file but the darcs repository should be complete). If you compile and install and "import Text.Regex.Lazy.Compat" then you should get a drop in replacement for Text.Regex that may even perform faster. This uses pure Haskell (especially Parsec) to emulate the behavior of Text.Regex. Any reported difference in behavior will be treated as a bug. The code also has different modules for doing fancier regular expressions (T.R.L.Full) and for using a very fast DFA engine (T.R.L.CompatDFA). The library should work but the design is changing/improving except for T.R.L.Compat which has the same interface as Text.Regex. If your stress test still dies, then it will not be a problem with regcomp. -- Chris Kuklewicz

Hello Rich, Tuesday, July 18, 2006, 11:47:58 PM, you wrote:
I have a server application that I am building using GHC 6.4 (yes, an update to 6.4.2 is on the horizon, but not in the immediate future - unless it fixes this problem :) ) under
6.4.2 (or better a current stable build) fixes a lot of bugs and 100 backward compatible. if you are developing rather large program, it will be much better to upgrade, in order to simplify your debugging work -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

I'm trying to use heap profiling with +RTS -hc -i1 options and running my program for about 30 seconds. However, I only get around 7 samples with seemingly bogus timetags (i.e. 0.00, 3.69, 3.73, 3.10, 4.05, 4.12). What's going on? I'm running GHC 6.4.2 on Windows (MSYS/MinGW). Thanks, Rich

Rich Fought wrote:
I'm trying to use heap profiling with +RTS -hc -i1 options and running my program for about 30 seconds. However, I only get around 7 samples with seemingly bogus timetags (i.e. 0.00, 3.69, 3.73, 3.10, 4.05, 4.12). What's going on?
I'm running GHC 6.4.2 on Windows (MSYS/MinGW).
IIRC the timestamps ignore time spent in GC and time spent sampling the heap, so they measure runtime of the program only. Still, it does look a little strange. Maybe you could send us the output of +RTS -S? Cheers, Simon

Simon Marlow wrote:
IIRC the timestamps ignore time spent in GC and time spent sampling the heap, so they measure runtime of the program only.
So if I have a server that is idle most of the time waiting for requests, the timestamps recorded in the heap profile will not be "real time" but program execution time?
Still, it does look a little strange. Maybe you could send us the output of +RTS -S? Alloc Collect Live GC GC TOT TOT Page Flts bytes bytes bytes user elap user elap 264424 262144 20592 0.00 0.00 0.03 0.01 0 0 (Gen: 1) 311540 270336 63732 0.00 0.00 3.84 5.25 0 0 (Gen: 0) 270304 290816 102352 0.00 0.00 3.85 5.26 0 0 (Gen: 0) 273472 270336 133644 0.00 0.00 3.85 5.26 0 0 (Gen: 0) 314540 294912 167000 0.00 0.00 3.90 5.31 0 0 (Gen: 0) 266356 294912 212212 0.00 0.00 3.90 5.31 0 0 (Gen: 0) 262128 331776 214120 0.00 0.00 3.92 5.33 0 0 (Gen: 0) 262128 270336 216456 0.00 0.00 3.92 5.33 0 0 (Gen: 0) 262128 266240 218800 0.00 0.00 3.92 5.33 0 0 (Gen: 0) 262128 266240 221140 0.00 0.00 3.92 5.33 0 0 (Gen: 0) 262128 266240 225844 0.00 0.00 3.92 5.33 0 0 (Gen: 0) 262128 266240 231052 0.00 0.00 3.92 5.33 0 0 (Gen: 0) 262128 266240 236516 0.00 0.00 3.92 5.33 0 0 (Gen: 0) 262128 266240 242344 0.00 0.00 3.92 5.34 0 0 (Gen: 0) 262144 266240 248576 0.00 0.00 3.92 5.34 0 0 (Gen: 0) 262128 266240 255212 0.00 0.00 3.92 5.34 0 0 (Gen: 0) 262128 266240 262328 0.00 0.00 3.92 5.34 0 0 (Gen: 0) 262128 266240 270016 0.00 0.00 3.92 5.34 0 0 (Gen: 0) 262128 266240 278620 0.00 0.00 3.92 5.34 0 0 (Gen: 0) 262128 266240 288648 0.01 0.01 3.94 5.35 0 0 (Gen: 0) 262128 266240 300372 0.00 0.00 3.94 5.35 0 0 (Gen: 0) 262128 266240 317000 0.00 0.00 3.94 5.35 0 0 (Gen: 0) 262128 266240 346080 0.00 0.00 3.94 5.35 0 0 (Gen: 0) 262128 270336 348424 0.00 0.00 3.94 5.35 0 0 (Gen: 0) 262128 266240 350760 0.00 0.00 3.94 5.35 0 0 (Gen: 0) 262128 266240 353364 0.00 0.00 3.94 5.36 0 0 (Gen: 0) 262128 266240 358512 0.00 0.00 3.94 5.36 0 0 (Gen: 0) 262128 266240 363748 0.00 0.00 3.94 5.36 0 0 (Gen: 0) 262128 266240 369324 0.00 0.00 3.94 5.36 0 0 (Gen: 0) 262128 266240 375232 0.00 0.00 3.94 5.36 0 0 (Gen: 0) 262128 266240 381528 0.00 0.00 3.94 5.36 0 0 (Gen: 0) 262128 266240 388320 0.00 0.00 3.94 5.36 0 0 (Gen: 0) 262128 266240 395496 0.00 0.00 3.94 5.36 0 0 (Gen: 0) 262128 266240 403424 0.00 0.00 3.94 5.36 0 0 (Gen: 0) 262128 266240 412268 0.00 0.00 3.94 5.36 0 0 (Gen: 0) 262128 266240 422584 0.00 0.00 3.94 5.36 0 0 (Gen: 0) 262128 266240 435088 0.00 0.00 3.96 5.37 0 0 (Gen: 0) 262128 266240 454476 0.00 0.00 3.96 5.37 0 0 (Gen: 0) 262224 266240 477316 0.00 0.00 3.96 5.37 0 0 (Gen: 0) 262128 270336 479676 0.00 0.00 3.96 5.37 0 0 (Gen: 0) 262128 266240 482012 0.00 0.00 3.96 5.38 0 0 (Gen: 0) 262128 266240 485236 0.00 0.00 3.96 5.38 0 0 (Gen: 0) 262128 266240 490364 0.00 0.00 3.96 5.38 0 0 (Gen: 0) 262124 266240 495700 0.00 0.00 3.96 5.38 0 0 (Gen: 0) 262128 266240 501292 0.00 0.00 3.96 5.38 0 0 (Gen: 0) 262128 266240 507344 0.00 0.00 3.96 5.38 0 0 (Gen: 0) 262128 266240 513676 0.00 0.00 3.96 5.38 0 0 (Gen: 0) 262128 266240 520632 0.00 0.00 3.96 5.38 0 0 (Gen: 0) 262128 266240 527924 0.00 0.00 3.96 5.38 0 0 (Gen: 0) 262128 266240 536016 0.00 0.00 3.96 5.38 0 0 (Gen: 0) 262128 266240 545180 0.00 0.00 3.96 5.38 0 0 (Gen: 0) 262128 266240 555808 0.01 0.01 3.98 5.39 0 0 (Gen: 0) 262128 266240 569148 0.00 0.00 3.98 5.39 0 0 (Gen: 0) 262128 266240 593492 0.00 0.00 3.98 5.39 0 0 (Gen: 0) 262152 266240 683760 0.00 0.00 3.98 5.39 0 0 (Gen: 0) 262128 266240 779176 0.00 0.00 3.98 5.39 0 0 (Gen: 0) 262128 266240 874588 0.00 0.00 3.98 5.39 0 0 (Gen: 0) 262128 266240 968636 0.00 0.00 3.98 5.39 0 0 (Gen: 0) 262128 266240 1060608 0.00 0.00 3.98 5.40 0 0 (Gen: 0) 262128 1290240 194316 0.00 0.00 3.98 5.40 0 0 (Gen: 1) 263232 266240 201176 0.00 0.00 3.98 5.40 0 0 (Gen: 0) 262128 266240 207568 0.00 0.00 3.98 5.40 0 0 (Gen: 0) 262128 266240 214472 0.00 0.00 3.98 5.40 0 0 (Gen: 0) 262128 266240 221944 0.00 0.00 3.98 5.40 0 0 (Gen: 0) 262128 266240 230156 0.00 0.00 3.98 5.40 0 0 (Gen: 0) 262128 266240 239628 0.00 0.00 4.00 5.41 0 0 (Gen: 0) 262128 266240 251036 0.00 0.00 4.00 5.41 0 0 (Gen: 0) 262128 266240 266660 0.00 0.00 4.00 5.41 0 0 (Gen: 0) 262160 266240 289928 0.00 0.00 4.00 5.41 0 0 (Gen: 0) 267908 266240 363816 0.00 0.00 4.03 5.45 0 0 (Gen: 0) 269488 339968 379412 0.00 0.00 4.11 5.53 0 0 (Gen: 0) 271056 311296 407972 0.00 0.00 4.11 5.53 0 0 (Gen: 0) 267520 270336 416420 0.00 0.00 4.12 5.54 0 0 (Gen: 0) 264136 274432 464492 0.00 0.00 4.12 5.54 0 0 (Gen: 0) 304612 319488 460256 0.00 0.00 4.17 5.60 0 0 (Gen: 0) 268912 286720 470944 0.00 0.00 4.18 5.61 0 0 (Gen: 0) 304308 282624 492792 0.00 0.00 4.21 5.64 0 0 (Gen: 0) 268136 286720 499576 0.01 0.01 4.22 5.65 0 0 (Gen: 0) 181052 0.00 0.00
20,883,096 bytes allocated in the heap 1,971,016 bytes copied during GC 194,316 bytes maximum residency (2 sample(s)) 78 collections in generation 0 ( 0.03s) 2 collections in generation 1 ( 0.00s) 2 Mb total memory in use INIT time 0.02s ( 0.00s elapsed) MUT time 4.17s ( 5.62s elapsed) GC time 0.03s ( 0.03s elapsed) RP time 0.00s ( 0.00s elapsed) PROF time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 4.22s ( 5.65s elapsed) %GC time 0.7% (0.5% elapsed) Alloc rate 4,988,795 bytes per MUT second Productivity 98.8% of total user, 73.8% of total elapsed Rich

Rich Fought wrote:
IIRC the timestamps ignore time spent in GC and time spent sampling the heap, so they measure runtime of the program only.
So if I have a server that is idle most of the time waiting for requests, the timestamps recorded in the heap profile will not be "real time" but program execution time?
That's right - the general idea is to make the profile insensitive to other loading on the machine. I can see there might be an argument for making this tweakable, though.
Still, it does look a little strange. Maybe you could send us the output of +RTS -S?
Alloc Collect Live GC GC TOT TOT Page Flts bytes bytes bytes user elap user elap 264424 262144 20592 0.00 0.00 0.03 0.01 0 0 (Gen: 1) 311540 270336 63732 0.00 0.00 3.84 5.25 0 0 (Gen: 0)
So between these first two GCs, your program spent nearly 4 seconds doing something, but only allocated 300k. What was it doing? Cheers, Simon

Simon Marlow wrote:
That's right - the general idea is to make the profile insensitive to other loading on the machine. I can see there might be an argument for making this tweakable, though.
This would be nice to have if one is trying to correlate behavior with specific I/O events in time (i.e. messages coming into a server).
Alloc Collect Live GC GC TOT TOT Page Flts bytes bytes bytes user elap user elap 264424 262144 20592 0.00 0.00 0.03 0.01 0 0 (Gen: 1) 311540 270336 63732 0.00 0.00 3.84 5.25 0 0 (Gen: 0)
So between these first two GCs, your program spent nearly 4 seconds doing something, but only allocated 300k. What was it doing? If I'm not mistaken, it's negotiating a TLS session.
I think I tracked down the bizarre timetag issue, it is already in Trac: http://hackage.haskell.org/trac/ghc/ticket/769 So the out of order timetag of 3.100 in my original e-mail should really be 4.0. Rich

On a separate issue, I can't seem to get the -hcname or -hmname options to work. For instance, profiling with just -hm yields an entry for module Network.HTTP, yet when I try to profile again with -hmNetwork.HTTP, I do not get a heap dump. Is there some special formatting I am missing? Thanks, Rich Rich Fought wrote:
I'm trying to use heap profiling with +RTS -hc -i1 options and running my program for about 30 seconds. However, I only get around 7 samples with seemingly bogus timetags (i.e. 0.00, 3.69, 3.73, 3.10, 4.05, 4.12). What's going on?
I'm running GHC 6.4.2 on Windows (MSYS/MinGW).
Thanks, Rich
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
participants (4)
-
Bulat Ziganshin
-
Chris Kuklewicz
-
Rich Fought
-
Simon Marlow