New subject: Adding split/split' to Data.List, and redefining words/lines with it; also, adding replace/replaceBy

10 Jul 2008

      Hi everyone. So recently while doing some shell scripting, I found myself redefining a 'split' function (take an item and a list, and make list of lists everywhere the item appears) *yet again*, and I got annoyed enough to resolve to fix the situation. While I was at it, I decided that since 'lines' and 'words' are conceptually specializations of a general split function, I would come up with rewrites for them too; much more aesthetically satisfying to me, as it's clearer in the code now that lines and words are essentially a specialization of split, but with pragmatic edge cases (which mean we don't get nice identities like 'unlines . lines == id', but makes them more useful with, say, getContents). Code:
...
...
...
...
...
...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > lines' :: String -> [String] > lines' s = removeTrailingNull (split' '\n' s) >  where >    removeTrailingNull :: [String] -> [String] >    removeTrailingNull y = case y of >                             [] -> [] >                             [""] -> [] >                             (x:xs) -> x : removeTrailingNull xs >
linesProp :: String -> Bool > linesProp x = (Prelude.lines x == lines' x) >
words' :: String -> [String] > words' = filter (not . and . map isSpace) . split isSpace >
wordsProp :: String -> Bool > wordsProp x = (Prelude.words x == words' x) >
split :: (a -> Bool) -> [a] -> [[a]] > split _ [] = [] > split p s = let (l,s') = break p s in l : case s' of >                                            [] -> [] >                                            (r:s'') -> [r] : split p s'' >
splitUndoProp, splitUndoIdemProp, splitPreserveDelimsProp :: (Eq a) => a -> [a] > > -> Bool > splitUndoProp x y = (concat $ split (==x) y) == y > splitUndoIdemProp x y = (concat $ concat $ split (==[x]) $ split (==x) y) == y > splitPreserveDelimsProp x y = (length $ elemIndices [x] $ split (==x) y) == > > (length $ elemIndices x y) >
split' :: (Eq a) => a -> [a] -> [[a]] > split' a b = filter (/= [a]) $ split (\x -> x==a) b > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I've run many many QuickChecks testing against the Prelude lines and words, and the definitions seem to be correct.

What do people think of adding these? I know I'm not the only one who has wanted split or split' on more than one occasion, and they are not the funnest functions to rewrite every time you want them.

(About all they're missing are Haddocks; and perhaps a better name for split' which reflects how it is lossy and can't be undone while split can be.)

------

On a secondary note, but less important than the foregoing, I'd like to add two functions: 'replace' and 'replaceBy'. They do basically what they sound like: given two items, change every occurrence in a given list of one item to another. These are two other functions I often have to redefine, which still surprises me - Data.List has a surfeit of obscure functions I've never used and which are kind of odd, but a basic search-and-replace function isn't there? I mean, I'm not saying let's add enough functions to Data.List to turn it into a mini-Perl, but it strikes me as a real gap. (As before, I've defined some sensible QC properties and checked, although the definitions look obviously right to me.) Code:
...
...
...
...
...
...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > replaceBy :: (a -> Bool) -> a -> [a] -> [a] > replaceBy a b = map (\x -> if a x then b else x) >
replace :: (Eq a) => a -> a -> [a] -> [a] > replace a = replaceBy (==a) >
replaceLengthProp :: (Eq a) => a -> a -> [a] -> Bool > replaceLengthProp x y z = (length $ replace x y z) == (length z) > replaceUndoableProp :: (Eq a) => a -> a -> [a] -> Bool > replaceUndoableProp x y z = if not (y `elem` z) then z == (replace y x $ replace > > x y z) else True > replaceIdempotentProp :: (Eq a) => a -> a -> [a] -> Bool > replaceIdempotentProp x y z = (replace x y $ replace x y z) == (replace x y z) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
--
gwern
PRF fritz Lon News IG Keyhole advise VFCT SITOR MDA

Adding split/split' to Data.List, and redefining words/lines with it; also, adding replace/replaceBy

Gwern Branwen

David Roundy

Gwern Branwen

Neil Mitchell

David Roundy

Eric Torreborre

Jonathan Cast

Brandon S. Allbery KF8NH

Jonathan Cast

Curt Sampson

Henning Thielemann

Gwern Branwen

Gwern Branwen

Alistair Bayley

Neil Mitchell

Henning Thielemann

Bart Massey

Bart Massey

Brandon S. Allbery KF8NH

tags

participants (10)