
I hope you'll forgive me for re-advertising my FPS modifications. I've started over from Don's sources (please don't use my old fps repo), refactored, and reworked my changes into that. The refactored repo (all functionality and performance identical to the original): http://www.ii.uib.no/~ketil/src/fps-wrapped Repo with added Latin1 and ASCII support: http://www.ii.uib.no/~ketil/src/fps-i18n Latin1 functions equal to Char8, but packing chars > 255 will give an error. ASCII does the same, but stores characters > 127 out of harms way. Adding support for new character sets requires defining four functions and three constants, and #include'ing a common file. In addition, some nice properties hold, for instance: s1 > s2 => pack s1 > pack s2 w2c . c2w == id -- provided no error c2w . w2c == id -- total function Only the latter holds for Char8. Latin1 has been tested with the Char8 QC tests, and they have all been subjected to the benchmark suite, results at http://www.ii.uib.no/~ketil/src/bench.txt (This is using /usr/share/word/dict) Packing and unpacking isn't part of the benchmark, but is expected to be around 10% slower than for Char8. I have no explanation why 'map' and 'split' are faster. -k -- If I haven't seen further, it is by standing in the footprints of giants