
On 20.04 10:52, Bulat Ziganshin wrote:
this lib should be slower than your on small strings due to ForeignPtr inefficiency
Actually having O(1) substrings is very nice and can improve performance quite a lot. Another feature is the easy integration with low level libraries that want a Ptr for input/output.
and i think that Donald should mention in doc/announcement that his lib is latin-1 only. it's not good that each of us should scan his sources to rediscover this fact
Actually I am using fps with UTF8 and no problems. The trick is that I care about substrings rather than invidual characters. Usually one ends up doing all the splitting etc on ascii characters and the rest are handled as substrings where character boundaries are meaningless. We can use the UTF8 strings on multiple levels: 1) just bytes + ascii character matching 2) match physical unicode characters one by one 3) match unicode substrings I would argue that in many cases either 1) or 3) is what is really wanted. Composite characters and combining marks make 2) problematic. FPS does 1) quite well and it should be feasible to build separate modules providing 2) or 3) on top of it. Haskell does not support full range of unicode characters for meaningful operations. One cannot do IO with the standard libraries with Chars outside the Latin-1 range. - Einar Karttunen