Slower with ByteStrings?

Hello, We recently had a challenge as follows: Given a word, find all the words in the dictionary which can be made from the letters of that word. A letter can be used at most as many times as it appears in the input word. So, "letter" can only match words with 0, 1, or 2 t's in them. I opted for simplicity in my implementation including hard coding the input word and using /usr/share/dict/words as the dictionary: -- Begin Words.hs module Main where import List -- I was lazy and borrowed perms from the Haskell wiki -- but I wrote everything else perms [] = [[]] perms xs = [ x : ps | x <- xs, ps <- perms (xs\\[x]) ] -- creates permutations of all lengths then cleans up duplicates and gets rid of the -- empty list, this is probably the least efficient way possible allPerms x = drop 1 $ sort $ nub $ concatMap inits $ perms x main = do wordList <- readFile "/usr/share/dict/words" let words = lines wordList mapM_ print $ filter (`elem` words) $ allPerms "ubuntu" -- End Words.hs Next I decided to try it with byte stings: -- Begin ByteStringWords.hs module Main where import List -- import qualified Data.ByteString.Char8 as C import qualified Data.ByteString.Lazy.Char8 as C -- I was lazy and borrowed perms from the Haskell wiki -- but I wrote everything else perms [] = [[]] perms xs = [ x : ps | x <- xs, ps <- perms (xs\\[x]) ] -- creates permutations of all lengths then cleans up duplicates and gets rid of the -- empty list, this is probably the least efficient way possible allPerms x = drop 1 $ sort $ nub $ concatMap inits $ perms x main = do wordList <- C.readFile "/usr/share/dict/words" let words = C.lines wordList mapM_ print $ filter (`elem` words) $ map C.pack $ allPerms "ubuntu" -- End ByteStringWords.hs I don't think the overhead to compute the permutations matters here as the input to the permutations calculation is so small. Any ideas why the byte string version is slower? (Strict bytestrings appear to be about 2 seconds slower and lazy bytestrings appear to be about 1 second slower). I think, given my simple algorithm that means that (==) for ByteStrings is slower than (==) for String. Is this possible? I think the program might be spending more time cleaning up after execution with the ByteString versions as it seems to "stall" after printing the last match. Thanks, Jason

Jason Dagit wrote:
Given a word, find all the words in the dictionary which can be made from the letters of that word. A letter can be used at most as many times as it appears in the input word. So, "letter" can only match words with 0, 1, or 2 t's in them.
I don't know about the ByteString thing but how about a general speedup? frequencies = map (\x -> (head x, length x)) . group . sort superset xs = \ys -> let y = frequencies ys in length y == lx && and (zipWith (\(c,i) (d,j) -> c == d && i >= j) x y) where x = frequencies xs lx = length x main = interact $ unlines . filter ("ubuntu" `superset`) . lines Regards, apfelmus

from the letters of that word. A letter can be used at most as many times as it appears in the input word. So, "letter" can only match words with 0, 1, or 2 t's in them.
frequencies = map (\x -> (head x, length x)) . group . sort superset xs = \ys -> let y = frequencies ys in length y == lx && and (zipWith (\(c,i) (d,j) -> c == d && i >= j) x y) where x = frequencies xs lx = length x
As far as I understand the spec, this algorithm is not correct: superset "ubuntu" "tun" == False Is at least one 'b' necessary, yes or no? If the answer is no, the following algorithm solves the problem and is faster then the one above: del y = del_acc [] where del_acc _ [] = mzero del_acc v (x:xs) | x == y = return (v++xs) del_acc v (x:xs) = del_acc (x:v) xs super u = not . null . foldM (flip del) u main = interact $ unlines . filter ("ubuntu" `super`) . lines BR, -- -- Mirko Rahn -- Tel +49-721 608 7504 -- --- http://liinwww.ira.uka.de/~rahn/ ---

Mirko Rahn wrote:
from the letters of that word. A letter can be used at most as many times as it appears in the input word. So, "letter" can only match words with 0, 1, or 2 t's in them.
frequencies = map (\x -> (head x, length x)) . group . sort superset xs = \ys -> let y = frequencies ys in length y == lx && and (zipWith (\(c,i) (d,j) -> c == d && i >= j) x y) where x = frequencies xs lx = length x
As far as I understand the spec, this algorithm is not correct:
superset "ubuntu" "tun" == False
Is at least one 'b' necessary, yes or no?
Oops, you are indeed right, the answer should be "no". I thought I'd came away without primitive recursion, but here's a correct version superset xs = superset' x . sort ys where x = sort xs _ `superset` [] = True [] `superset` _ = False (x:xs) `superset'` (y:ys) | x == y = xs `superset` ys | x < y = xs `superset` (y:ys) | otherwise = False
If the answer is no, the following algorithm solves the problem and is faster then the one above:
del y = del_acc [] where del_acc _ [] = mzero del_acc v (x:xs) | x == y = return (v++xs) del_acc v (x:xs) = del_acc (x:v) xs
super u = not . null . foldM (flip del) u
main = interact $ unlines . filter ("ubuntu" `super`) . lines
The algorithm is correct but it's not faster, xs `super` ys takes O(n*m) time whereas superset takes O(n * log n + m * log m) time given a proper sorting algorithm. Here, n = length xs and m = length ys. Actually, both algorithms are essentially the same except for the sorting that allows to drop some equality tests. (Note that memoizing x = sort xs over different ys speeds things up a bit for the intended application. This way, (sort "ubuntu") is only computed once and the running time over many ys approaches O(n + m*log m).) Regards, apfelmus PS: Some exercises for the interested reader: 1) Still, the algorithm super has an advantage over superset. Which one? 2) Put xs into a good data structure and achieve a O(m * log n) time for multiple ys. 3) Is this running time always better than the aforementioned O(n + m*log m)? What about very large m > n?

[fixed some typos, mainly missing primes]
superset xs = superset' x . sort where x = sort xs
_ `superset'` [] = True [] `superset'` _ = False (x:xs) `superset'` (y:ys) | x == y = xs `superset'` ys | x < y = xs `superset'` (y:ys) | otherwise = False
del y = del_acc [] where del_acc _ [] = mzero del_acc v (x:xs) | x == y = return (v++xs) del_acc v (x:xs) = del_acc (x:v) xs
The algorithm is correct but it's not faster, xs `super` ys takes O(n*m) time whereas superset takes O(n * log n + m * log m) time given a proper sorting algorithm. Here, n = length xs and m = length ys.
Of course, you are right. In worst case super is much slower than superset. In average case (for some assumptions about the inputs) it could perform quite well because of the chance to detect non-subset words early.
2) Put xs into a good data structure and achieve a O(m * log n) time for multiple ys.
You mean something along supermap xs = let mx = Map.fromListWith (+) [ (x,1) | x <- xs ] ins _ 1 = Nothing ins _ v = Just (v-1) upd m y = case Map.updateLookupWithKey ins y m of (Nothing,_ ) -> mzero (_ ,m') -> return m' in not . null . foldM upd mx Thanks for your time, BR, -- -- Mirko Rahn -- Tel +49-721 608 7504 -- --- http://liinwww.ira.uka.de/~rahn/ ---

Jason Dagit wrote:
I think, given my simple algorithm that means that (==) for ByteStrings is slower than (==) for String. Is this possible?
Yes indeed. Over ByteStrings, (==) is implemented as a call to memcmp. For small strings, this loses by a large margin because it has to go through the FFI.

bos:
Jason Dagit wrote:
I think, given my simple algorithm that means that (==) for ByteStrings is slower than (==) for String. Is this possible?
Yes indeed. Over ByteStrings, (==) is implemented as a call to memcmp. For small strings, this loses by a large margin because it has to go through the FFI.
Yes, a non-memcmp version can sometimes be profitably used here. Something like this Core: eq !n (Ptr p) (Ptr q) = inlinePerformIO $ IO $ go n p q where go !n p q s | n == 0 = (# s , True #) | otherwise = case readInt8OffAddr# p 0# s of (# s, a #) -> case readInt8OffAddr# q 0# s of (# s, b #) | a /=# b -> (# s, False #) | otherwise -> go (n-1) (plusAddr# p 1#) (plusAddr# q 1#) s Ok, so that's not Core, but it could be ;) -- Don

Hello Bryan, Sunday, May 27, 2007, 3:30:50 AM, you wrote:
I think, given my simple algorithm that means that (==) for ByteStrings is slower than (==) for String. Is this possible?
Yes indeed. Over ByteStrings, (==) is implemented as a call to memcmp. For small strings, this loses by a large margin because it has to go through the FFI.
how about using *unsafe* memcmp import and more complex code for the case of large BS length? a==b | min (length a) (length b) < 20 = memcmp a b .... -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

bulat.ziganshin:
Hello Bryan,
Sunday, May 27, 2007, 3:30:50 AM, you wrote:
I think, given my simple algorithm that means that (==) for ByteStrings is slower than (==) for String. Is this possible?
Yes indeed. Over ByteStrings, (==) is implemented as a call to memcmp. For small strings, this loses by a large margin because it has to go through the FFI.
how about using *unsafe* memcmp import and more complex code for the case of large BS length?
a==b | min (length a) (length b) < 20 = memcmp a b ....
Good idea. I'll try to do this before the next bytestring comes out . -- Don
participants (6)
-
apfelmus
-
Bryan O'Sullivan
-
Bulat Ziganshin
-
dons@cse.unsw.edu.au
-
Jason Dagit
-
Mirko Rahn