I did some more refining and I like this one much better -

import Text.Regex.Posix


start = "<bug>"
end = "</bug>"


{-
s = start encountered
e = end encountered
ns = new start encountered
ne = new encountered
-}
process s e []     = []
process s e (x:xs) = if ns && not ne then
                        x : process ns ne xs
                     else
                        if ne then
                           [x]
                        else
                           process ns ne xs
        where ns = if s then
                      s
                   else
                      x =~ start
              ne = x =~ end


main = do
     str <- readFile "test.txt"
     let x = process False False (lines str)
     print (unlines x)



On Mon, Jul 16, 2012 at 10:27 PM, Carlos J. G. Duarte <carlos.j.g.duarte@gmail.com> wrote:
I see what you mean. I wasn't aware of the slowness of ++
So your solution is just fine (? asking): you've just made a special takewhile that includes the matching predicate and handled the input like a pipeline.
It reminds a Unix command line, in the case: awk '/<bug>/,NR==0'  | awk 'NR==1,/<\/bug>/'



On 07/16/12 12:10, C K Kashyap wrote:
Thanks Carlos - you can import Text.Regex.Posix to get (=~)
Is there a way to avoid the (++) in your implementation? It has a linear time overhead.

Regards,
Kashyap

On Mon, Jul 16, 2012 at 1:36 AM, Carlos J. G. Duarte <carlos.j.g.duarte@gmail.com> wrote:
Looks good to me, but I'm just a beginner!
I used the isInfixOf from Data.List instead of =~ to run your example because the later wasn't working on my instalation.

I've made a slightly variant using the break function:

import Data.List

startTag = "<bug>"
endTag = "</bug>"

main = interact process

process  = unlines . extractSection startTag endTag . lines
extractSection start stop xs =
  let (ls,rs) = break (isInfixOf stop) $ dropWhile (not . isInfixOf start) xs
  in ls ++ take 1 rs



On 07/15/12 13:08, C K Kashyap wrote:
Hi,
I've written a small haskell program to extract a section from a file between start and end markers. For example, if I have a file such as below - 
a
b
c
        <bug>
d
e
f
        </bug>
g
h
i

I'd like to extract the contents between <bug> and </bug> (including the markers). 

startTag = "<bug>"
endTag = "</bug>"

process  = unlines . specialTakeWhile (f endTag) . dropWhile (f startTag) . lines 
        where f t x = not (x =~ t) 
              specialTakeWhile :: (a -> Bool) -> [a] -> [a]
              specialTakeWhile ff [] = []
              specialTakeWhile ff (x:xs) = if ff x then x:(specialTakeWhile ff xs) else [x]


It'll be great if I could get some feedback on this.

Regards,
Kashyap


_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners



_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners





_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners