Re: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

Thomas Hartman

14 Mar 2009 14 Mar '09

11:01 p.m.

So, I tweaked Text.Regex to have the behavior I need. http://patch-tag.com/repo/haskell-learning/browse/regexStuff/pcreReplace.hs FWIW, the problem I was trying to solve was deleting single newlines but not strings of newlines in a document. Dead simple for pcre-regex with lookaround. But, I think, impossible with posix regex. -- replace single newlines, but not strings of newlines (requires pcre look-around (lookaround, lookahead, lookbehind, for googlebot)) http://perldoc.perl.org/perlre.html testPcre = ( subRegex (mkRegex "(?:

...

Right, I'm just saying that a "subRegex" that worked on pcre regex matches would be great for people used to perl regexen and unused to posix -- even it only allowed a string replacement, and didn't have all the bells and whistles of =~ s../../../ in perl.

2009/3/12 ChrisK

...
Thomas Hartman wrote:

...
Is there something like subRegex... something like =~ s/.../.../ in perl... for haskell pcre Regexen?

I mean, subRegex from Text.Regex of course: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/regex-compat

Thanks for any advice,

thomas.

Short answer: No.

This is a FAQ. The usual answer to your follow up "Why not?" is that the design space is rather huge. Rather than justify this statement, I will point at the complicated module:

http://hackage.haskell.org/packages/archive/split/0.1.1/doc/html/Data-List-S...

The above module is "a wide range of strategies for splitting lists", which is a much simpler problem than your subRegex request, and only works on lists. A subRegex library should also work on bytestrings (and Seq).

At the cost of writing your own routine you get exactly what you want in a screen or less of code, see http://hackage.haskell.org/packages/archive/regex-compat/0.92/doc/html/src/T... for "subRegex" which is 30 lines of code.

Cheers, Chris

Show replies by date

Don Stewart

14 Mar 14 Mar

11:12 p.m.

New subject: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

Also, consider stealing the regex susbt code from: http://shootout.alioth.debian.org/u64q/benchmark.php?test=regexdna&lang=ghc&id=4 tphyahoo:

...

So, I tweaked Text.Regex to have the behavior I need.

http://patch-tag.com/repo/haskell-learning/browse/regexStuff/pcreReplace.hs

FWIW, the problem I was trying to solve was deleting single newlines but not strings of newlines in a document. Dead simple for pcre-regex with lookaround. But, I think, impossible with posix regex.

-- replace single newlines, but not strings of newlines (requires pcre look-around (lookaround, lookahead, lookbehind, for googlebot))

http://perldoc.perl.org/perlre.html

testPcre = ( subRegex (mkRegex "(?
Can I lobby for this to make its way into the Regex distribution? Really, I would argue that every regex flavor should have all the functions that Text.Regex get, not just posix. (subRegex is just the most important, to my mind)

Otherwise I'll make my own RegexHelpers hackage package or something.

Hard for me to see how to do this in an elegant way since the pcre packages are so polymorphic-manic. I'm sure there is a way though.

Or if you point me to the darcs head of regex I'll patch that directly.

2009/3/14 Thomas Hartman :

...
Right, I'm just saying that a "subRegex" that worked on pcre regex matches would be great for people used to perl regexen and unused to posix -- even it only allowed a string replacement, and didn't have all the bells and whistles of =~ s../../../ in perl.

2009/3/12 ChrisK

...
Thomas Hartman wrote:

...
Is there something like subRegex... something like =~ s/.../.../ in perl... for haskell pcre Regexen?

I mean, subRegex from Text.Regex of course: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/regex-compat

Thanks for any advice,

thomas.

Short answer: No.

This is a FAQ. The usual answer to your follow up "Why not?" is that the design space is rather huge. Rather than justify this statement, I will point at the complicated module:

http://hackage.haskell.org/packages/archive/split/0.1.1/doc/html/Data-List-S...

The above module is "a wide range of strategies for splitting lists", which is a much simpler problem than your subRegex request, and only works on lists. A subRegex library should also work on bytestrings (and Seq).

At the cost of writing your own routine you get exactly what you want in a screen or less of code, see http://hackage.haskell.org/packages/archive/regex-compat/0.92/doc/html/src/T... for "subRegex" which is 30 lines of code.

Cheers, Chris

Brandon S. Allbery KF8NH

11:39 p.m.

New subject: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

On 2009 Mar 14, at 19:01, Thomas Hartman wrote:

...

FWIW, the problem I was trying to solve was deleting single newlines but not strings of newlines in a document. Dead simple for pcre-regex with lookaround. But, I think, impossible with posix regex.

s/(^|[^\n])\n($|[^\n])/\1\2/g; POSIX regexen may be ugly, but they're capable. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Thomas Hartman

16 Mar 16 Mar

midnight

New subject: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

Except that there is nothing like =~ s in haskell, as far as I can tell. I was mulling over this and thinking, the nicest solution for this -- from the lens of perl evangelism anyway -- would be to have some way of accessing the perl6 language =~ s mechanism in pugs, which would get us everything in perl 5 =~, and also all the cool grammar stuff that comes in perl6, which seems 90% of the way to parsec in terms of power but with a thought out huffman-optimized syntax. Accordingly I am trying to load pugs in ghci, about which more at http://perlmonks.org/?node_id=750768 2009/3/14 Brandon S. Allbery KF8NH :

...

On 2009 Mar 14, at 19:01, Thomas Hartman wrote:

...
FWIW, the problem I was trying to solve was deleting single newlines but not strings of newlines in a document. Dead simple for pcre-regex with lookaround. But, I think, impossible with posix regex.

s/(^|[^\n])\n($|[^\n])/\1\2/g;

POSIX regexen may be ugly, but they're capable.

-- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

ChrisK

12:50 p.m.

New subject: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

Thomas Hartman wrote:

...

testPcre = ( subRegex (mkRegex "(?

quoting from the man page for regcomp:

...

REG_NEWLINE Compile for newline-sensitive matching. By default, newline is a completely ordinary character with no special meaning in either REs or strings. With this flag, `[^' bracket expressions and `.' never match newline, a `^' anchor matches the null string after any newline in the string in addition to its normal function, and the `$' anchor matches the null string before any newline in the string in addition to its normal function.

This is the carried over to Text.Regex with

...

mkRegexWithOpts Source :: String The regular expression to compile -> Bool True <=> '^' and '$' match the beginning and end of individual lines respectively, and '.' does not match the newline character. -> Bool True <=> matching is case-sensitive -> Regex Returns: the compiled regular expression Makes a regular expression, where the multi-line and case-sensitive options can be changed from the default settings.

Or with regex-posix directly the flag is "compNewline": http://hackage.haskell.org/packages/archive/regex-posix/0.94.1/doc/html/Text...

...

The defaultCompOpt is (compExtended .|. compNewline).

You want to match a \n that is not next to any other \n. So you want to turn off REG_NEWLINE.

...

import Text.Regex.Compat

r :: Regex r = mkRegexWithOpts "(^|[^\n])\n($|[^\n])" False True -- False is important here

The ^ and $ take care of matching a lone newline at the start or end of the whole text. In the middle of the text the pattern is equivalent to [^\n]\n[^\n]. When substituting you can use the \1 and \2 captures to restore the matched non-newline character if one was present.

ChrisK

12:50 p.m.

New subject: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

Thomas Hartman wrote:

...

testPcre = ( subRegex (mkRegex "(?

quoting from the man page for regcomp:

...

REG_NEWLINE Compile for newline-sensitive matching. By default, newline is a completely ordinary character with no special meaning in either REs or strings. With this flag, `[^' bracket expressions and `.' never match newline, a `^' anchor matches the null string after any newline in the string in addition to its normal function, and the `$' anchor matches the null string before any newline in the string in addition to its normal function.

This is the carried over to Text.Regex with

...

mkRegexWithOpts Source :: String The regular expression to compile -> Bool True <=> '^' and '$' match the beginning and end of individual lines respectively, and '.' does not match the newline character. -> Bool True <=> matching is case-sensitive -> Regex Returns: the compiled regular expression Makes a regular expression, where the multi-line and case-sensitive options can be changed from the default settings.

Or with regex-posix directly the flag is "compNewline": http://hackage.haskell.org/packages/archive/regex-posix/0.94.1/doc/html/Text...

...

The defaultCompOpt is (compExtended .|. compNewline).

You want to match a \n that is not next to any other \n. So you want to turn off REG_NEWLINE.

...

import Text.Regex.Compat

r :: Regex r = mkRegexWithOpts "(^|[^\n])\n($|[^\n])" False True -- False is important here

Thomas Hartman

7:24 p.m.

New subject: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

Thanks, that was extremely helpful. My bad for being so sloppy reading the documentation so sloppily -- I somehow glossed over the bit that backreferences worked as one would expect. To atone for this, http://patch-tag.com/repo/haskell-learning/browse/regexStuff/pcreReplace.hs shows successful =~ s/../../ -like behavior for a pcre and a posix-like (but compatible with pcre engine) regex in the same example, which is based on pcre regex. (See testPcre, testPosix). FWIW, I still think that there should be a library subRegex function for all regex flavors, and not just Posix. If there are gotchas about how capture references work in different flavors I might backpedal on this, but Im not aware of any. 2009/3/16 ChrisK :

...

Thomas Hartman wrote:

...
testPcre = ( subRegex (mkRegex "(?
quoting from the man page for regcomp:

...
REG_NEWLINE Compile for newline-sensitive matching. By default, newline is a completely ordinary character with no special meaning in either REs or strings. With this flag, `[^' bracket expressions and `.' never match newline, a `^' anchor matches the null string after any newline in the string in addition to its normal function, and the `$' anchor matches the null string before any newline in the string in addition to its normal function.

This is the carried over to Text.Regex with

...
mkRegexWithOpts Source :: String The regular expression to compile -> Bool True <=> '^' and '$' match the beginning and end of individual lines respectively, and '.' does not match the newline character. -> Bool True <=> matching is case-sensitive -> Regex Returns: the compiled regular expression Makes a regular expression, where the multi-line and case-sensitive options can be changed from the default settings.

Or with regex-posix directly the flag is "compNewline": http://hackage.haskell.org/packages/archive/regex-posix/0.94.1/doc/html/Text...

...
The defaultCompOpt is (compExtended .|. compNewline).

You want to match a \n that is not next to any other \n.

So you want to turn off REG_NEWLINE.

...
import Text.Regex.Compat

r :: Regex r = mkRegexWithOpts "(^|[^\n])\n($|[^\n])" False True -- False is important here

The ^ and $ take care of matching a lone newline at the start or end of the whole text. In the middle of the text the pattern is equivalent to [^\n]\n[^\n].

When substituting you can use the \1 and \2 captures to restore the matched non-newline character if one was present.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

5963

Age (days ago)

5965

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Brandon S. Allbery KF8NH
ChrisK
Don Stewart
Thomas Hartman