
Am Donnerstag 29 April 2010 23:13:21 schrieb Jean-Nicolas Jolivet:
-- assuming that only characters > 'i' (chr 105) are escaped (or the escape character itself, but that should be dropped regardless).
Sorry, this is my fault for not being clearer about it: I am decoding text that is already encoded with a Binary to ASCII encoding... the exact encoding process is this:
1. Fetch a character from the input stream. 2. Increment the character's ASCII value by 42, modulo 256 3. If the result is a critical character ('\NUL', '\r', '\n' or '='), write the escape character ('=') to the output stream and increment character's ASCII value by 64, modulo 256. 4. Output the character to the output stream.
Okay, so there are no two escape characters in succession.
I am writing a decoder here so obviously I am reversing the process.... (I remove 42 for regular characters, 106 for special characters...and if the result is < 0, I add 256 to it...)
Just adding (yet again) more context information here ;) Still trying to fully understand your last suggestions Daniel! :) Thanks again!
Also, I wouldn't want anyone to think I just want someone to write the algorithm for me! :) Like I said, I already have a fully working algorithm, but being new to Haskell, I'm looking for ways to optimize it (or, really just different ways that I could do it!)
Here's how I am doing it right now (I am using ByteString here,
Then the foldr is not the best option.
but it should be pretty straightforward anyway)... I'm warning you, this is pretty ugly :)
-- Decode an encoded ByteString -- the zip + tail method removes the first character so I am adding it afterward (ugly) decodeByteString :: L.ByteString -> L.ByteString decodeByteString str = do let str1 = mapMaybe decodepair (L.zip str(L.tail str)) let firstChar = decodechar (ord(L.head str) - 42) L.pack (firstChar:str1)
Is it really necessary to pack it? That's a relatively expensive operation, it may be better to have it return a String if you're not doing anything with it after decoding except writing it to a file or so. The zipping is also not the optimal choice, it takes a lot of checking for the chunk-boundaries (I assume you're using lazy ByteStrings, since you chose the prefix L), and you construct pairs only to deconstruct them immediately (the compiler *may* optimise them away, but I'm skeptical).
-- Decode a pair of character, returning either the -- decoded character or Nothing decodepair :: (Char, Char) -> Maybe Char decodepair cs
| snd(cs) == '=' = Nothing | fst(cs) == '=' = Just (decodechar(ord(snd cs) - 106)) | otherwise = Just (decodechar(ord(snd cs) - 42))
-- Reverse the modulo 256... decodechar :: Int -> Char decodechar i
| i < 0 = chr (i + 256) | otherwise = chr i
Since you're doing arithmetic modulo 256, that stuff can be done faster and simpler with Word8. ---------------------------------------------------------------------- {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Lazy as L import qualified Data.ByteString as S import Data.ByteString.Unsafe (unsafeAt) escape :: Word8 -> Word8 escape = (+150) normal :: Word8 -> Word8 normal = (+214) decodeW :: L.ByteString -> [Word8] decodeW = dec False . L.toChunks where dec _ [] = [] dec esc (str:more) = go esc 0 where !len = S.length str {-# INLINE charAt #-} charAt :: Int -> Word8 charAt i = unsafeAt str i go !b !i | i == len = dec b more | b = escape (charAt i) : go False (i+1) | otherwise = case charAt i of 61 -> go True (i+1) c -> normal c : go False (i+1) word8ToChar :: Word8 -> Char word8ToChar = toEnum . fromIntegral decodeC :: L.ByteString -> String decodeC = map word8ToChar . decodeW decodeBS :: L.ByteString -> L.ByteString decodeBS = L.pack . decodeW ----------------------------------------------------------------------
Jean-Nicolas Jolivet
On 2010-04-29, at 4:50 PM, Daniel Fischer wrote:
Am Donnerstag 29 April 2010 21:37:15 schrieb Jean-Nicolas Jolivet:
First I would like to thank everyone for the very interesting replies and suggestions I got so far!...
I tried to implement (and at the very least understand) most of them!...
To add to the context here, what I am trying to do is:
-apply a "transformation" to a character (in my case, subtracting 42 to its ASCII value, which I obtain with chr(ord(c) - 42) -if the character is preceded by a specific character (that would be, an escape character, in this case '=') then subtract 106 to its value instead of 42... -if the character is the escape character itself, '=', then skip it altogether
Ah, that complicates matters a little. - What happens if ord c < 42 (ord c < 106, if c is preceded by the escape character?) - What about escaped escape characters?
However,
foo xs = catMaybes $ zipWith f (' ':xs) xs where f _ '=' = Nothing f '=' c = Just (chr $ ord c - 106) f _ c = Just (chr $ ord c - 42)
is still pretty simple, as is the direct recursion
foo = go ' ' where go _ ('=' :cs) = go '=' cs go '=' (c:cs) = chr (ord c - 106) : go c cs go _ (c:cs) = chr (ord c - 42) : go c cs go _ _ = []
-- assuming that only characters > 'i' (chr 105) are escaped (or the escape character itself, but that should be dropped regardless).
fooGen :: Char -> (Char -> Char) -> (Char -> Char) -> String -> String fooGen e esc norm str = catMaybes $ zipWith f (d:str) str where d = if e == maxBound then pred e else succ e f x y
| y == e = Nothing | x == e = Just (esc y) | otherwise = Just (norm y)
is an easy generalisation.
(keeping in mind that the next character needs to be escaped)...
I managed to do it, however I'm not totally satisfied in the way I did it... the problem was that... as I just explained, in some cases, the character that is being processed has to be "skipped" (and by that I mean, not added to the resulting string). This happens when the processed character IS the escape character...
What I did was to build a List of Maybe Char.... my function does the proper operation on the character and returns a "Just Char" when the character is processed, or Nothing when it is the escaped character... so basically I would end up with something like: [Just 'f', Just 'o', Just 'o', Nothing]... I am mapping this using mapMaybe to end up with a proper String...
Would there be any more efficient way of doing this?
That is already pretty efficient. The direct recursion is probably a bit more efficient, but I don't think the difference will be large.
Considering that the escape character should NOT be added to the resulting string, is there any way I can avoid using the Maybe monad?
Sure, apart from the direct recursion,
fooGen e esc norm str = tail $ foldr f [] (d:str) where d = if e == maxBound then pred e else succ e f x (y:zs)
| y == e = x:zs | x == e = x:esc y:zs | otherwise = x:norm y:zs
f x [] = [x]
catMaybes and zipWith is clearer, though, and I don't think the foldr will perform better.
Once again, thanks everyone for all the suggestions!
Jean-Nicolas Jolivet