Splitting a string into chunks

Hi, I'm trying to split a string into a list of substrings, where substrings are delimited by blank lines. This feels like it *should* be a primitive operation, but I can't seem to find one that works. It's neither a fold nor a partition, since each chunk is separated by a 2-character sequence. It's also not a grouping operation, since ghc's Data.List.groupBy examines the first element in a sequence with each candidate member of the same sequence, as demonstrated by: Prelude> :module + Data.List Prelude Data.List> let t = "asdfjkl;" Prelude Data.List> groupBy (\a _ -> a == 's') t ["a","sdfjkl;"] As a result, I've wound up with this: -- Convert a file into blocks separated by blank lines (two -- consecutive \n characters.) NB: Requires UNIX linefeeds blocks :: String -> [String] blocks s = f "" s where f "" [] = [] f s [] = [s] f s ('\n':'\n':rest) = (s:f "" rest) f s (a:rest) = f (s ++ [a]) rest Which somehow feels ugly. This feels like it should be a fold, a group or something, where the test is something like: (\a b -> (a /= '\n') && (b /= '\n')) Any thoughts? Thanks, -- Adam

On 1/13/06, Adam Turoff
Hi,
I'm trying to split a string into a list of substrings, where substrings are delimited by blank lines.
This feels like it *should* be a primitive operation, but I can't seem to find one that works. It's neither a fold nor a partition, since each chunk is separated by a 2-character sequence. It's also not a grouping operation, since ghc's Data.List.groupBy examines the first element in a sequence with each candidate member of the same sequence, as demonstrated by:
Prelude> :module + Data.List Prelude Data.List> let t = "asdfjkl;" Prelude Data.List> groupBy (\a _ -> a == 's') t ["a","sdfjkl;"]
As a result, I've wound up with this:
-- Convert a file into blocks separated by blank lines (two -- consecutive \n characters.) NB: Requires UNIX linefeeds
blocks :: String -> [String] blocks s = f "" s where f "" [] = [] f s [] = [s] f s ('\n':'\n':rest) = (s:f "" rest) f s (a:rest) = f (s ++ [a]) rest
Which somehow feels ugly. This feels like it should be a fold, a group or something, where the test is something like:
(\a b -> (a /= '\n') && (b /= '\n'))
Off the top of my head: blocks = map concat . groupBy (const null) . lines The lines function splits it into lines, the groupBy will group the list into lists of lists and split when the sedond of two adjacent elements is null (which is what an empty line passed to lines will give you) and then a concat on each of the elements of this list will "undo" the redundant lines-splitting that lines performed... /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

On 1/13/06, Sebastian Sylvan
On 1/13/06, Adam Turoff
wrote: Hi,
I'm trying to split a string into a list of substrings, where substrings are delimited by blank lines.
This feels like it *should* be a primitive operation, but I can't seem to find one that works. It's neither a fold nor a partition, since each chunk is separated by a 2-character sequence. It's also not a grouping operation, since ghc's Data.List.groupBy examines the first element in a sequence with each candidate member of the same sequence, as demonstrated by:
Prelude> :module + Data.List Prelude Data.List> let t = "asdfjkl;" Prelude Data.List> groupBy (\a _ -> a == 's') t ["a","sdfjkl;"]
As a result, I've wound up with this:
-- Convert a file into blocks separated by blank lines (two -- consecutive \n characters.) NB: Requires UNIX linefeeds
blocks :: String -> [String] blocks s = f "" s where f "" [] = [] f s [] = [s] f s ('\n':'\n':rest) = (s:f "" rest) f s (a:rest) = f (s ++ [a]) rest
Which somehow feels ugly. This feels like it should be a fold, a group or something, where the test is something like:
(\a b -> (a /= '\n') && (b /= '\n'))
Off the top of my head:
blocks = map concat . groupBy (const null) . lines
The lines function splits it into lines, the groupBy will group the list into lists of lists and split when the sedond of two adjacent elements is null (which is what an empty line passed to lines will give you) and then a concat on each of the elements of this list will "undo" the redundant lines-splitting that lines performed...
Sorry, I got the meaning of groupBy mixed up, it should be blocks = map concat . groupBy (const (not . null)) . lines /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

That works except it loses single newline characters.
let s = "1234\n5678\n\nabcdefghijklmnopq\n\n,,.,.,."
Prelude> blocks s
["12345678","abcdefghijklmnopq",",,.,.,."]
Jared.
On 1/13/06, Sebastian Sylvan
On 1/13/06, Sebastian Sylvan
wrote: On 1/13/06, Adam Turoff
wrote: Hi,
I'm trying to split a string into a list of substrings, where substrings are delimited by blank lines.
This feels like it *should* be a primitive operation, but I can't seem to find one that works. It's neither a fold nor a partition, since each chunk is separated by a 2-character sequence. It's also not a grouping operation, since ghc's Data.List.groupBy examines the first element in a sequence with each candidate member of the same sequence, as demonstrated by:
Prelude> :module + Data.List Prelude Data.List> let t = "asdfjkl;" Prelude Data.List> groupBy (\a _ -> a == 's') t ["a","sdfjkl;"]
As a result, I've wound up with this:
-- Convert a file into blocks separated by blank lines (two -- consecutive \n characters.) NB: Requires UNIX linefeeds
blocks :: String -> [String] blocks s = f "" s where f "" [] = [] f s [] = [s] f s ('\n':'\n':rest) = (s:f "" rest) f s (a:rest) = f (s ++ [a]) rest
Which somehow feels ugly. This feels like it should be a fold, a group or something, where the test is something like:
(\a b -> (a /= '\n') && (b /= '\n'))
Off the top of my head:
blocks = map concat . groupBy (const null) . lines
The lines function splits it into lines, the groupBy will group the list into lists of lists and split when the sedond of two adjacent elements is null (which is what an empty line passed to lines will give you) and then a concat on each of the elements of this list will "undo" the redundant lines-splitting that lines performed...
Sorry, I got the meaning of groupBy mixed up, it should be
blocks = map concat . groupBy (const (not . null)) . lines
/S
-- Sebastian Sylvan +46(0)736-818655 UIN: 44640862 _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- jupdike@gmail.com http://www.updike.org/~jared/ reverse ")-:"

On 2006-01-13 at 13:32PST Jared Updike wrote:
That works except it loses single newline characters.
let s = "1234\n5678\n\nabcdefghijklmnopq\n\n,,.,.,." Prelude> blocks s ["12345678","abcdefghijklmnopq",",,.,.,."]
Also the argument to groupBy ought to be some sort of equivalence relation. blocks = map unlines . filter (all $ not . null) . groupBy (\a b -> not (null b|| null a)) . lines ... but that suffers from the somewhat questionable properties of lines and unlines. -- Jón Fairbairn Jon.Fairbairn at cl.cam.ac.uk

On Jan 13, 2006, at 4:35 PM, Jon Fairbairn wrote:
On 2006-01-13 at 13:32PST Jared Updike wrote:
That works except it loses single newline characters.
let s = "1234\n5678\n\nabcdefghijklmnopq\n\n,,.,.,." Prelude> blocks s ["12345678","abcdefghijklmnopq",",,.,.,."]
Also the argument to groupBy ought to be some sort of equivalence relation.
Humm, still not reflexive. You need xor.
blocks = map unlines . filter (all $ not . null) . groupBy (\a b -> not (null b|| null a)) . lines
... but that suffers from the somewhat questionable properties of lines and unlines.
-- Jón Fairbairn Jon.Fairbairn at cl.cam.ac.uk
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG

On 2006-01-13 at 16:50EST Robert Dockins wrote:
On Jan 13, 2006, at 4:35 PM, Jon Fairbairn wrote:
On 2006-01-13 at 13:32PST Jared Updike wrote:
That works except it loses single newline characters.
let s = "1234\n5678\n\nabcdefghijklmnopq\n\n,,.,.,." Prelude> blocks s ["12345678","abcdefghijklmnopq",",,.,.,."]
Also the argument to groupBy ought to be some sort of equivalence relation.
Humm, still not reflexive. You need xor.
ugh, yes. How about
blocks = map unlines . filter (all $ not . null) . groupBy
(\a b -> null b == null a)
. lines
? -- Jón Fairbairn Jon.Fairbairn at cl.cam.ac.uk

On 1/13/06, Sebastian Sylvan
blocks = map concat . groupBy (const (not . null)) . lines
Thanks. That's a little more involved than I was looking for, but that certainly looks better than pattern matching on ('\n':'\n':rest). ;-) For the record, lines removes the trailing newline, so a string like: a b c d becomes ["ab", "cd"], which can interfere with processing if the whitespace is significant. Changing this to blocks = map unlines . groupBy (const (not . null)) . lines re-adds all of the newlines, thus re-adding the significant whitespace, while still chunking everything into blocks: ["a\nb\n","\nc\nd\n"] Thanks again, -- Adam
participants (5)
-
Adam Turoff
-
Jared Updike
-
Jon Fairbairn
-
Robert Dockins
-
Sebastian Sylvan