
El 20/2/2023 a las 1:43 p. m., Viktor Dukhovni escribió:
On Mon, Feb 20, 2023 at 10:46:38AM -0400, Pedro B. wrote:
Thanks Li-yao . As I mentioned in my answer to Viktor, I am now using the ByteString functions except when I want to parse Char8's, for example to parse an 'a' with Data.Attoparsec.ByteString.Char8.char 'a'.
FWIW, you can often avoid the Char8 combinators, e.g. for matching a specific 8-bit (ASCII) character, at a modest loss of readability, you can just match its Word8 code point:
0x0a <--- '\n' 0x0d <--- '\r' 0x20 <--- ' ' 0x30 <--- '0' 0x41 <--- 'A' 0x61 <--- 'a' ...
I am comfortable with the raw hex values of various "interesting" characters, but you can also define aliases:
import Data.Char (ord)
char_nl, char_cr, char_sp, char_0, char_A, char_a :: Word8 char_nl = fromIntegral $ ord '\n' char_cr = fromIntegral $ ord '\r' char_sp = fromIntegral $ ord ' ' ...
I am using the Data.Word8 module provided by the word8 package, which defines _lf, _tab, _cr, and so on, and even _a.._z, _0.._9, etc. For example, I may use (==_tab) as the argument for Data.Attoparsec.ByteString.takeTill. You made me realize that I can use "word8 _a" instead of "char 'a'" and almost have no need for the Char8 combinators. I'll probably do that and only use "decimal" from Char8 to parse integers, which I need to parse line ranges such as "2,10". I still have a doubt though: given that I only match specific characters generated by diff, do I gain something by not using Char8? Performance, perhaps? Regards, Pedro