cutting long strings into lines

newer
Re: [Haskell-cafe] smallest double...

older
Re: [Haskell-cafe] smallest double...

Andrea Rossato

30 Sep 2006 30 Sep '06

3:02 p.m.

Hello! I've been trying for quite some time to find an elegant solution to cut long strings into lines, but the only solution I was able to come up is the following piece of ugly code. Is there a library function for that? What kind of approach would you suggest? Thanks for your kind attention. Andrea Here's the code: -- does the actual job wrapString str = foldr addNL "" $ rmFirstSpace $ concat $ splitS (getIndx $ indx str) str -- gets the indexes of the spaces within a string indx = findIndices (\x -> if x == ' ' then True else False) -- gets the indexes of where to split the string into lines: lines -- must be between 60 and 75 char long getIndx :: [Int] -> [Int] getIndx = takeFirst . checkBound . (delete 0) . nub . map (\x -> if x > 60 && x `rem` 60 >= 0 && x `rem` 70 <= 10 then x else 0) -- groups indexes when their distance is too short checkBound = groupBy (\x y -> if y - x < 10 then True else False) -- takes the first index of a group of indexes takeFirst = map (\(x:xs) -> x) -- split a string given a list of indexes splitS _ [] = [] splitS (x:xs) (ls) = [take x ls] : splitS (map (\i -> i - x) xs) (drop x ls) splitS _ ls = [ls]:[] -- remove the first space from the begging of a string in a list of strings rmFirstSpace = map (\(x:xs) -> if x == ' ' then xs else x:xs) -- used by foldr to fold the list of substrings addNL s s1 = s ++ "\n" ++ s1 try with putStrLn $ wrapString longString where: longString = "The Haskell XML Toolbox (HXT) is a collection of tools for processing XML with Haskell. The core component of the Haskell XML Toolbox is a domain specific language, consisting of a set of combinators, for processing XML trees in a simple and elegant way. The combinator library is based on the concept of arrows. The main component is a validating and namespace aware XML-Parser that supports almost fully the XML 1.0 Standard. Extensions are a validator for RelaxNG and an XPath evaluator."

Show replies by date

Neil Mitchell

30 Sep 30 Sep

3:36 p.m.

Hi, You want a function that converts a string into a list of strings, i.e. with the type signature: String -> [String] Go to hoogle (http://haskell.org/hoogle) and search for that: http://haskell.org/hoogle/?q=String->[String] (if you can't be bothered to do that, the answer is "lines" ;) Thanks Neil On 9/30/06, Andrea Rossato wrote:

...

Hello!

I've been trying for quite some time to find an elegant solution to cut long strings into lines, but the only solution I was able to come up is the following piece of ugly code.

Is there a library function for that? What kind of approach would you suggest?

Thanks for your kind attention.

Andrea

Here's the code:

-- does the actual job wrapString str = foldr addNL "" $ rmFirstSpace $ concat $ splitS (getIndx $ indx str) str

-- gets the indexes of the spaces within a string indx = findIndices (\x -> if x == ' ' then True else False)

-- gets the indexes of where to split the string into lines: lines -- must be between 60 and 75 char long getIndx :: [Int] -> [Int] getIndx = takeFirst . checkBound . (delete 0) . nub . map (\x -> if x > 60 && x `rem` 60 >= 0 && x `rem` 70 <= 10 then x else 0)

-- groups indexes when their distance is too short checkBound = groupBy (\x y -> if y - x < 10 then True else False)

-- takes the first index of a group of indexes takeFirst = map (\(x:xs) -> x)

-- split a string given a list of indexes splitS _ [] = [] splitS (x:xs) (ls) = [take x ls] : splitS (map (\i -> i - x) xs) (drop x ls) splitS _ ls = [ls]:[]

-- remove the first space from the begging of a string in a list of strings rmFirstSpace = map (\(x:xs) -> if x == ' ' then xs else x:xs)

-- used by foldr to fold the list of substrings addNL s s1 = s ++ "\n" ++ s1

try with putStrLn $ wrapString longString where: longString = "The Haskell XML Toolbox (HXT) is a collection of tools for processing XML with Haskell. The core component of the Haskell XML Toolbox is a domain specific language, consisting of a set of combinators, for processing XML trees in a simple and elegant way. The combinator library is based on the concept of arrows. The main component is a validating and namespace aware XML-Parser that supports almost fully the XML 1.0 Standard. Extensions are a validator for RelaxNG and an XPath evaluator." _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Matthias Fischmann

5:18 p.m.

On Sat, Sep 30, 2006 at 04:36:02PM +0100, Neil Mitchell wrote:

...

(if you can't be bothered to do that, the answer is "lines" ;)

although this wasn't the original problem, i like it, too :). but now i am stuck in finding an optimal implementation for lines. the following implementation is slightly slower than the built-in function, and i suspect this to stem from the occurrance of reverse for each line: cut1 :: String -> [String] cut1 = f "" where f x "" = [reverse x] f x ('\n':xs) = reverse x : f "" xs f x (c:xs) = f (c:x) xs i vaguely remember having seen a CPS trick here before, but all i can come up with is the yet a little slower cut2 :: String -> [String] cut2 = f id where f k "" = [k ""] f k ('\n':xs) = k "" : f id xs f k (c:xs) = f k' xs where k' cs = k (c:cs) also, i think both implementations are line-strict, that is, each line is fully evaluated once touched in the first character. is there a similar implementation, with CPS or not, that is lazy in the lines and more efficient? thanks, matthias

Udo Stenzel

6:51 p.m.

Matthias Fischmann wrote:

...

although this wasn't the original problem, i like it, too :). but now i am stuck in finding an optimal implementation for lines.

Isn't the obvious one good enough? lines [] = [] lines s = go s where go [] = [[]] go ('\n':s) = [] : lines s go (c:s) = let (l:ls) = go s in (c:l):ls Udo. -- "Money can't buy friends, but it can get you a better class of enemy." -- Spike Milligan

Matthias Fischmann

8:53 p.m.

On Sat, Sep 30, 2006 at 08:51:40PM +0200, Udo Stenzel wrote:

...

To: Matthias Fischmann Cc: haskell-cafe@haskell.org From: Udo Stenzel Date: Sat, 30 Sep 2006 20:51:40 +0200 Subject: Re: [Haskell-cafe] cutting long strings into lines

Matthias Fischmann wrote:

...
although this wasn't the original problem, i like it, too :). but now i am stuck in finding an optimal implementation for lines.

Isn't the obvious one good enough?

lines [] = [] lines s = go s where go [] = [[]] go ('\n':s) = [] : lines s go (c:s) = let (l:ls) = go s in (c:l):ls

thanks. good enough, yes. just not obvious to me... (-: matthias

mark＠ixod.org

3:54 p.m.

I've been doing it as the enclosed. I wrote it a while ago, though, and haven't really looked too hard at it since. -- Mark

Matthias Fischmann

4:56 p.m.

On Sat, Sep 30, 2006 at 11:54:19AM -0400, Mark T.B. Carroll wrote:

...

module WordWrap (wrap) where import Data.Maybe

options :: String -> [(String, String)]

options [] = [("", "")]

options (x:xs) = let rest = map (\(ys, zs) -> (x:ys, zs)) (options xs) in if x == ' ' then ("", xs) : rest else rest

bestSplit :: Int -> String -> (String, String)

bestSplit width string = last (head wraps : takeWhile ((<= width) . length . fst) (options string))

works better if you just skip the "head wraps" part. (and now i am curious: what was it supposed to mean? how did it get there?)

...

wrap :: Int -> String -> [String]

wrap _ "" = []

wrap width string = let (x, ys) = bestSplit width string in x : wrap width ys

Bertram Felgenhauer

6:21 p.m.

Matthias Fischmann wrote:

...

On Sat, Sep 30, 2006 at 11:54:19AM -0400, Mark T.B. Carroll wrote:

...
module WordWrap (wrap) where import Data.Maybe

options :: String -> [(String, String)]

options [] = [("", "")]

options (x:xs) = let rest = map (\(ys, zs) -> (x:ys, zs)) (options xs) in if x == ' ' then ("", xs) : rest else rest

bestSplit :: Int -> String -> (String, String)

bestSplit width string = last (head wraps : takeWhile ((<= width) . length . fst) (options string))

Works better if you just skip the "head wraps" part. (and now i am curious: what was it supposed to mean? how did it get there?)

I'll venture a guess: The code originally read | bestSplit width string = | let wraps = options string | in last (head wraps : takeWhile ((<= width) . length . fst) wraps) Without a precaution like this, long words will cause wrap to go into an infinite loop. Depending on your requirements you could also consider | bestSplit width string = | last (splitAt width string : takeWhile ((<= width) . length . fst) | (options string)) which will cut too long words into pieces fitting on the lines. (code untested) regards, Bertram

mark＠ixod.org

1 Oct 1 Oct

2:42 a.m.

Bertram Felgenhauer writes: (snip)

...

I'll venture a guess: The code originally read

| bestSplit width string = | let wraps = options string | in last (head wraps : takeWhile ((<= width) . length . fst) wraps) (snip)

Version control agrees with you. Next time I'll try compiling whatever version I find at hand before posting. I knew it worked at one time ... (-: -- Mark

Bulat Ziganshin

30 Sep 30 Sep

4:56 p.m.

Hello Andrea, Saturday, September 30, 2006, 7:02:34 PM, you wrote:

...

-- gets the indexes of the spaces within a string indx = findIndices (\x -> if x == ' ' then True else False)

indx = findIndices (==' ')

...

-- takes the first index of a group of indexes takeFirst = map (\(x:xs) -> x)

takeFirst = map head

...

-- split a string given a list of indexes splitS _ [] = [] splitS (x:xs) (ls) = [take x ls] : splitS (map (\i -> i - x) xs) (drop x ls) splitS _ ls = [ls]:[]

...

-- remove the first space from the begging of a string in a list of strings rmFirstSpace = map (\(x:xs) -> if x == ' ' then xs else x:xs)

i would prefer to use "map rmFirstSpace" where rmFirstSpace (' ':xs) = xs rmFirstSpace xs = xs

...

-- used by foldr to fold the list of substrings addNL s s1 = s ++ "\n" ++ s1

foldrl addNl == unlines ?

...

try with putStrLn $ wrapString longString where: longString = "The Haskell XML Toolbox (HXT) is a collection of tools for processing XML with Haskell. The core component of the Haskell XML Toolbox is a domain specific language, consisting of a set of combinators, for processing XML trees in a simple and elegant way. The combinator library is based on the concept of arrows. The main component is a validating and namespace aware XML-Parser that supports almost fully the XML 1.0 Standard. Extensions are a validator for RelaxNG and an XPath evaluator."

i think that your algorithm is too complex. standard algorithm, imho, is to find last space before 80 (or 75) chars margin, split here and then repeat this procedure again. so, one line split may look like splitAt . last . filter (<80) . findIndices (==' ') and then you need to define function which repeats this operation on the rest of list. or, slightky different solution: -- |this function splits the list xs into parts whose length defined -- by call to function len_f on the rest of list splitByLen len_f [] = [] splitByLen len_f xs = y : splitByLen len_f ys where (y,ys) = splitAt (len_f xs) xs -- |this function finds last space in String within 80-char boundary len_f = last . filter (<80) . findIndices (==' ') so, "splitByLen len_f" should give you that you need, you need only to add checks for some additional conditions (first word in line is more than 80 bytes long, it is a last line) and removing of the extra space on each line btw, are you seen http://haskell.org/haskellwiki/Simple_unix_tools ? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Andrea Rossato

5:23 p.m.

On Sat, Sep 30, 2006 at 08:56:24PM +0400, Bulat Ziganshin wrote:

...

i think that your algorithm is too complex. standard algorithm, imho, is to find last space before 80 (or 75) chars margin, split here and then repeat this procedure again. so, one line split may look like

splitAt . last . filter (<80) . findIndices (==' ') ...

Thank you very much for your analysis. I find it extremely helpful.

...

btw, are you seen http://haskell.org/haskellwiki/Simple_unix_tools ?

I did not! But I'm studying this page right now. Thanks for mentioning it. Once again, thank you! Andrea

Andrea Rossato

6:50 p.m.

On Sat, Sep 30, 2006 at 08:56:24PM +0400, Bulat Ziganshin wrote:

...

splitByLen len_f [] = [] splitByLen len_f xs = y : splitByLen len_f ys where (y,ys) = splitAt (len_f xs) xs ... so, "splitByLen len_f" should give you that you need, you need only to add checks for some additional conditions (first word in line is more than 80 bytes long, it is a last line) and removing of the extra space on each line

I came up with this solution that seem to be fine, to me. I does the checking of those additional conditions: findSplitP at = last . filter ( at then wrapLS at (tail remain) else tail remain then you can use lines/unlines to split it. Thanks for your help. Best regards, Andrea

6862

Age (days ago)

6863

Last active (days ago)

List overview

Download

11 comments

7 participants

participants (7)

Andrea Rossato
Bertram Felgenhauer
Bulat Ziganshin
mark＠ixod.org
Matthias Fischmann
Neil Mitchell
Udo Stenzel