
Before I go and add or change anything in the Haskell regex-posix library, I wanted to get some feedback. regex-posix provides Text.Regex.Posix, and is built on the regex-base package. regex-posix is also used as the backend for Text.Regex (the regex-compat package does this). I do not intend to change the behavior of the old Text.Regex API. The main issue is the behavior when returning a list of all matches of a Regex against a target text. I no longer think the current behavior is the right choice when it comes to zero-length matches. The current behavior is to return non-overlapping matches with the caveat that after the first zero-length match the search is ended. Note that the zero-length match may be occur at the end position of a previous non-zero-length match. Notably, no one has complained about this policy. But I no longer like it. So here are a few of my ideas of what to change it to: 0) No change, not worth the effort. 1) return the zero-length match, skip forward 1 character, and continue searching. If the consumer wishes the old policy they can truncate the list. This could also be filtered to resemble option 2 below. 2) Mimic "sed". It seems "sed" has a policy where a zero-length match is forbidden to occur at the end position of a non-zero-length match. "sed" does not stop with the first zero-length match. 3) implement additional execution options, so the user can choose a policy. The default policy choice left with the current behavior. 4) implement additional execution options, so the user can choose a policy. The default policy choice set to he behavior in (1). 5) Return valid matches starting from all positions, including overlapping matches. This I really do not like and one can run the search starting one character after the start of the last match to get this information. Matching "0123" and replacing all matches with themselves wrapped in angle brackets. The policies of 0, 1, and 2 above lead to (computed partly by hand): regex of "[0123]?" 0): "<0><1><2><3><>" 1): "<0><1><2><3><>" 2): "<0><1><2><3>" regex of "[012]?" 0): "<0><1><2>3<>" 1): "<0><1><2>3<>" 2): "<0><1><2>3<>" regex of "[013]?" 0): "<0><1><>23" 1): "<0><1><>2<3><>" 2): "<0><1>2<3>" regex of "[023]?" 0): "<0><>123" 1): "<0><>1<2><3><>" 2): "<0>1<2><3>" regex of "[123]?" 0): "<>0123" 1): "<>0<1><2><3><>" 2): "<>0<1><2><3>" regex of "[03]?" 0): "<0><>123" 1): "<0><>1<>2<3><>" 2): "<0>1<>2<3>" regex of "[03]?" 0): "<0><>123" 1): "<0><>1<>2<3><>" 2): "<0>1<>2<3>" regex of "[12]?" 0): "<>0123" 1): "<>0<1><2><>3<>" 2): "<>0<1><2>3<>" I am leaning to simply changing it from policy 0 to policy 1. Are there any objections? Perhaps I should set a deadline? Now where is that library policy... -- Chris