On Wed, Sep 8, 2010 at 7:18 AM, Ian Lynagh <igloo@earth.li> wrote:

Hmm, wouldn't you want to be able to break on
   either
       <a-with-umlaut>
   or
       <a> <umlaut combining character>
in that case?

No. For cases like that, you'd normalize and perhaps case-fold the text and pattern first, then break on a specific string. (Normalization is handled via text-icu.)