
On Thu, Jun 11, 2015 at 03:53:41PM +1000, Thomas Koster wrote:
My program needs to escape and unescape "special characters" in text (Data.Text.Text), using my own definition of "special character" (isSpecial :: Char -> Bool). I am looking for a library that provides functions that implement or help me implement this functionality. I don't really care exactly how the special characters are escaped, but my preference is to prefix them with backslashes.
Hi Thomas The answer to your question depends on whether your program needs additional functionality. If the only thing you need to do is taking special characters and escaping them with an escape character plus a substitute character, this can be done with very little code using functions from Data.Text: import Data.Text (Text) import qualified Data.Text as T -- Character used for escaping ec :: Char ec = '$' -- Replace a character to be escaped with its substitute escapeChar :: Char -> Char escapeChar = id -- Inverse of escapeChar unescapeChar :: Char -> Char unescapeChar = id -- True if given char needs to be escaped isSpecial :: Char -> Bool isSpecial = ('?' ==) -- Escape chars in a given text escape :: Text -> Text escape = T.concatMap handleChar where handleChar c | isSpecial c = T.pack [ec, escapeChar c] | otherwise = T.singleton c -- Unescape chars in a given text unescape :: Text -> Text unescape t = case T.break (ec ==) t of (a,b) | T.null b -> a | otherwise -> let b' = T.tail b e = unescapeChar $ T.head b' in T.append a $ T.cons e $ unescape (T.tail b') This code was loaded into ghci and tested there, so it should compile (GHC 7.10). Example: escape $ T.pack "This?Is?A?Test??" yields "This$?Is$?A$?Test$?$?" 'unescape' yields the original string. Note that the implementation does not handle trailing escape characters: "This$?Is$?A$" will throw an exception, but this can be remedied with very little additional code. You of course must provide the correct implementation for 'ec', 'escapeChar', and 'unescapeChar'. These you need to implement no matter what other library you use. If on the other hand you want to escape special characters with blocks of text (instead of single characters as in my code) you probably also need a second character to mark the end of an escape. Even then, the code should not get much more involved than the example above. Text validation and error handling before unescaping adds some more bloat, but again should be straight forward to add using Either as a return type. So, either this is all you need, or we need more information. Cheers Stefan