
Hi list, My program needs to escape and unescape "special characters" in text (Data.Text.Text), using my own definition of "special character" (isSpecial :: Char -> Bool). I am looking for a library that provides functions that implement or help me implement this functionality. I don't really care exactly how the special characters are escaped, but my preference is to prefix them with backslashes. While "attoparsec" does technically answer my question, it is as unimpressive an answer as "Prelude" unless the answer comes with a particularly clever and concise parser that blows my mind (and then kudos to the author). I am looking for a higher level library where I don't need to re-invent this wheel. That is, I don't want to write an unescaping parser if somebody has already published one on Hackage in a clean, well-tested library. My searches on Hoogle have turned up only network-uri, which offers percent-encoding with the definition of "special character" accepted as an argument [1]. This is the sort of thing I am after, although to use network-uri I would have to round-trip via String, something that I feel I should avoid. Functions of text types that return lazy text builders would be ideal. Also, percent-encoding is not my favourite encoding scheme. Thanks in advance. [1] https://hackage.haskell.org/package/network-uri/docs/Network-URI.html#v:esca... -- Thomas Koster

On Thu, Jun 11, 2015 at 03:53:41PM +1000, Thomas Koster wrote:
My program needs to escape and unescape "special characters" in text (Data.Text.Text), using my own definition of "special character" (isSpecial :: Char -> Bool). I am looking for a library that provides functions that implement or help me implement this functionality. I don't really care exactly how the special characters are escaped, but my preference is to prefix them with backslashes.
Hi Thomas The answer to your question depends on whether your program needs additional functionality. If the only thing you need to do is taking special characters and escaping them with an escape character plus a substitute character, this can be done with very little code using functions from Data.Text: import Data.Text (Text) import qualified Data.Text as T -- Character used for escaping ec :: Char ec = '$' -- Replace a character to be escaped with its substitute escapeChar :: Char -> Char escapeChar = id -- Inverse of escapeChar unescapeChar :: Char -> Char unescapeChar = id -- True if given char needs to be escaped isSpecial :: Char -> Bool isSpecial = ('?' ==) -- Escape chars in a given text escape :: Text -> Text escape = T.concatMap handleChar where handleChar c | isSpecial c = T.pack [ec, escapeChar c] | otherwise = T.singleton c -- Unescape chars in a given text unescape :: Text -> Text unescape t = case T.break (ec ==) t of (a,b) | T.null b -> a | otherwise -> let b' = T.tail b e = unescapeChar $ T.head b' in T.append a $ T.cons e $ unescape (T.tail b') This code was loaded into ghci and tested there, so it should compile (GHC 7.10). Example: escape $ T.pack "This?Is?A?Test??" yields "This$?Is$?A$?Test$?$?" 'unescape' yields the original string. Note that the implementation does not handle trailing escape characters: "This$?Is$?A$" will throw an exception, but this can be remedied with very little additional code. You of course must provide the correct implementation for 'ec', 'escapeChar', and 'unescapeChar'. These you need to implement no matter what other library you use. If on the other hand you want to escape special characters with blocks of text (instead of single characters as in my code) you probably also need a second character to mark the end of an escape. Even then, the code should not get much more involved than the example above. Text validation and error handling before unescaping adds some more bloat, but again should be straight forward to add using Either as a return type. So, either this is all you need, or we need more information. Cheers Stefan

Stefan, On Thu, Jun 11, 2015 at 03:53:41PM +1000, Thomas Koster wrote:
My program needs to escape and unescape "special characters" in text (Data.Text.Text), using my own definition of "special character" (isSpecial :: Char -> Bool). I am looking for a library that provides functions that implement or help me implement this functionality. I don't really care exactly how the special characters are escaped, but my preference is to prefix them with backslashes.
On 11 June 2015 at 16:55, Stefan Höck
The answer to your question depends on whether your program needs additional functionality. If the only thing you need to do is taking special characters and escaping them with an escape character plus a substitute character, this can be done with very little code using functions from Data.Text:
import Data.Text (Text) import qualified Data.Text as T
-- Character used for escaping ec :: Char ec = '$'
-- Replace a character to be escaped with its substitute escapeChar :: Char -> Char escapeChar = id
-- Inverse of escapeChar unescapeChar :: Char -> Char unescapeChar = id
-- True if given char needs to be escaped isSpecial :: Char -> Bool isSpecial = ('?' ==)
-- Escape chars in a given text escape :: Text -> Text escape = T.concatMap handleChar where handleChar c | isSpecial c = T.pack [ec, escapeChar c] | otherwise = T.singleton c
-- Unescape chars in a given text unescape :: Text -> Text unescape t = case T.break (ec ==) t of (a,b) | T.null b -> a | otherwise -> let b' = T.tail b e = unescapeChar $ T.head b' in T.append a $ T.cons e $ unescape (T.tail b')
Thank you for your response. Yes, this is all I need to do. I had completed about two thirds of a similar implementation before becoming concerned that I was spending too much time reinventing this particular wheel and that there may be much easier and/or shorter ways to do this using a library written by a wheel surgeon. The only substantial difference is that my own version uses and returns a Data.Text.Lazy.Builder so that texts can be streamed and spliced into larger texts without copying (I am also using Chris Done's formatting library [1]), but only if I get it right, of course, which is another reason why I started to look around for libraries by Haskellers more experienced than I.
If on the other hand you want to escape special characters with blocks of text (instead of single characters as in my code) you probably also need a second character to mark the end of an escape. Even then, the code should not get much more involved than the example above.
I don't need more functionality re the escaping itself; your implementation is a valid example that provides the essence of what I need. [1] https://hackage.haskell.org/package/formatting -- Thomas Koster
participants (2)
-
Stefan Höck
-
Thomas Koster