Re: Policy change for regex libraries

13 Jan 2009

      mail@justinbogner.com wrote:
...
Policy 2 seems the most intuitive to me, I can't imagine a situation
where you would want the empty match at the end of a non-empty match. On
the other hand, I don't think I've ever used a regular expression that
matched an empty string, so it's not particularly important to me.
The authors of sed are in agreement with your intuition.  But I think policy 2 
as a recursive definition it is unusual.  And I see policy 2 as very hard to 
summarize.

The single-match is always easy to summarize: the leftmost longest match.

Policy 1 for multiple matches can be summarized as:
...
All leftmost longest non-overlapping matches, where overlapping means sharing
the same matching characters.
Policy 2 for multiple matches can be summarized as:
...
All leftmost longest non-overlapping matches, where overlapping means sharing
the same matching characters, excluding zero-length matches that coincide with
the end of a non-zero-length match.
Policy 0 has been:
...
All leftmost longest non-overlapping matches, where overlapping means sharing
the same matching characters, stopping with the first zero-length match.
Policy 5 would be even simpler in this language:
...
The longest match at each position where a match occurs.
Policy 5 can be filtered to get policy 1.
Policy 1 can be filtered to get policy 0 or to get policy 2.

The best practices for a regexp search with multiple matches is to not use a 
regexp that can match zero-length bits of text (though it may be handy for 
finding newlines there are simpler ways).

So let me also put for "Policy 6"
...
All leftmost longest non-overlapping matches, where overlapping means sharing
the same matching characters, excluding all zero-length matches.
Note that the only zero-length POSIX extended regular expression matches are () 
which always matches or ^ or $ or ^$ (which is the same as $^).  But the policy 
I choose here will likely be applied to other regular expression backends that 
can do fancier testing.

Thanks for the comments!
   Chris

Re: Policy change for regex libraries

Chris Kuklewicz