
Hi All, I notice that Data.ByteString has span and spanEnd. Is there a known particular reason why dropWhile and takeWhile don't have corresponding *End functions? If not, what is the protocol for adding them? cheers, T. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

drtomc:
Hi All,
I notice that Data.ByteString has span and spanEnd. Is there a known
and break/breakEnd.
particular reason why dropWhile and takeWhile don't have corresponding *End functions? If not, what is the protocol for adding them?
There's no reason -- we couldn't decide on whether to support 'end/-right' versions of most traversals. To add them you'd implement them, send the patch to Duncan and I, for inclusion in bytestring 1.0. Duncan -- did we ever sort out a policy on the left/right normal/-end versions of things? breakEnd I use all the time, but perhaps we should fix upon what api we are to provide. -- Don

Well, maybe I shoud be asking a higher level question then. I have a function tidy = reverse . dropWhile punk . reverse . dropWhile punk where punk = isPunctuation . chr . fromIntegral which is leading to a significant amount of allocation, and you can see why. The way I'd like to write it is tidy = dropWhile punk . dropWhileEnd punk where .... which has the obvious advantage of avoiding quite a bit of intermediate allocation. Is there a another way? I note that since I'm using a nice declarative language, the compiler CLEARLY should be transforming the first form into the second. :-) T. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

drtomc:
Well, maybe I shoud be asking a higher level question then.
I have a function
tidy = reverse . dropWhile punk . reverse . dropWhile punk where punk = isPunctuation . chr . fromIntegral
which is leading to a significant amount of allocation, and you can see why.
The way I'd like to write it is
tidy = dropWhile punk . dropWhileEnd punk where ....
which has the obvious advantage of avoiding quite a bit of intermediate allocation.
Is there a another way?
I note that since I'm using a nice declarative language, the compiler CLEARLY should be transforming the first form into the second. :-)
I'd just manually write a 'tidy' loop (in the Data.ByteString style) (which would avoid all allocations), since it seems pretty useful. Something in this style: findIndexOrEnd :: (Word8 -> Bool) -> ByteString -> Int findIndexOrEnd k (PS x s l) = inlinePerformIO $ withForeignPtr x $ \f -> go (f `plusPtr` s) 0 where go !ptr !n | n >= l = return l | otherwise = do w <- peek ptr if k w then return n else go (ptr `plusPtr` 1) (n+1) If its costly, since that'll make it non-costly. -- Don

Donald Bruce Stewart wrote:
I'd just manually write a 'tidy' loop (in the Data.ByteString style) (which would avoid all allocations), since it seems pretty useful.
That would indeed be very useful to have as a library function. I've pined for Python's strip() string method (removes leading and trailing whitespace) for a while.

So the following isn't as clever as the line-noise Don posted, but should be in the ball-park. dropFromEnds p = dropWhile p . dropWhileEnd p dropWhileEnd p bs = take (findFromEndUntil (not p) bs) bs takeWhileEnd p bs = drop (findFromEndUntil p bs) bs {- findFromEndUntil is in ByteString.hs, but is not exported -} T. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

drtomc:
So the following isn't as clever as the line-noise Don posted, but should be in the ball-park.
Low level loops are irksome, but guaranteed to be quick :P
dropFromEnds p = dropWhile p . dropWhileEnd p
dropWhileEnd p bs = take (findFromEndUntil (not p) bs) bs
takeWhileEnd p bs = drop (findFromEndUntil p bs) bs
{- findFromEndUntil is in ByteString.hs, but is not exported -}
Yep, looks reasonable. With a bit of inlining (check the core) and you'll get the same code anyway. Always good to roll a QuickCheck or two for this kind of stuff, since off-by-one errors are rather easy. This should get you into a testable state: import qualified Data.ByteString as S import Test.QuickCheck.Batch import Test.QuickCheck import Text.Show.Functions import System.Random instance Arbitrary Word8 where arbitrary = choose (97, 105) coarbitrary c = variant (fromIntegral ((fromIntegral c) `rem` 4)) instance Random Word8 where randomR = integralRandomR random = randomR (minBound,maxBound) integralRandomR :: (Integral a, RandomGen g) => (a,a) -> g -> (a,g) integralRandomR (a,b) g = case randomR (fromIntegral a :: Integer, fromIntegral b :: Integer) g of (x,g) -> (fromIntegral x, g) -- define a model in [Word8] tidy_model f = reverse . dropWhile f . reverse . dropWhile f -- and check it prop_tidy_ok f xs = tidy_model f xs == (S.unpack . tidy f . S.pack) xs -- Don

Thomas Conway wrote:
Well, maybe I shoud be asking a higher level question then.
I have a function
tidy = reverse . dropWhile punk . reverse . dropWhile punk where punk = isPunctuation . chr . fromIntegral
which is leading to a significant amount of allocation, and you can see why.
The way I'd like to write it is
tidy = dropWhile punk . dropWhileEnd punk where ....
which has the obvious advantage of avoiding quite a bit of intermediate allocation.
Is there a another way?
I note that since I'm using a nice declarative language, the compiler CLEARLY should be transforming the first form into the second. :-)
The NDP library will implement this kind of fusion at some point (hopefully this year). We have a fairly clear idea of how to do it but not enough time. Roman
participants (4)
-
Bryan O'Sullivan
-
dons@cse.unsw.edu.au
-
Roman Leshchinskiy
-
Thomas Conway