
On Sat, Jan 21, 2012 at 8:18 AM, Twan van Laarhoven
Notice that there are lots of "miku-X" prefixes found. This is probably not what you want. What exactly do you want the algorithm to do? For example, "" is obviously a prefix of every string, but it is not very long. On the other hand, each string is a prefix of itself, but that prefix is shared by only one string (usually).
By the way, the sort and compare adjacent pairs approach corresponds to "atLeastThisManyDescendants 2".
Ah, now the code makes sense to me. It's longer, but it is a heck of a lot more principled and readable, so I'm happy to replace my version with yours. It's not too hard to convert it into a CLI filter with optional depth (default of 2, replicating original behavior): import qualified Data.Map as Map import System.Environment (getArgs) import Data.List (sortBy) import Data.Ord (comparing) main :: IO () main = do arg <- getArgs let n = if null arg then 2 else read (head arg) :: Int interact (unlines . chunk n . lines) chunk :: Int -> [String] -> [String] chunk n = map prefix . sortByLength . atLeastThisManyDescendants n . fromList where sortByLength :: [CommonPrefix Char] -> [CommonPrefix Char] sortByLength = sortBy (comparing (numDescendant . names)) ..... And the results seem kosher (printing just the prefixes is probably the best idea, but wouldn't be too hard to switch to printing full filenames - just filter the original file list with the extracted prefix from each CommonPrefix): $ ls music/vocaloid/| runhaskell lcp.hs 5 miku-s miku-t miku-r rin- miku-a gumi- luka- $ ls music/vocaloid/| runhaskell lcp.hs 4 miku-h miku-m miku-n miku-p miku-s miku-t miku-r rin- miku-a gumi- luka- $ ls music/vocaloid/| runhaskell lcp.hs # with 2 chorus- gumi-mo gumi-s kaito- luka-emon luka-t miku-acolorlinkingworld- miku-akayaka miku-cleantears-remind2011natsu- miku-dan miku-ele miku-galaxyodyssey- miku-ha miku-inn miku-jemappelle-motion- miku-kz- miku-lo miku-m@rk- miku-plustellia-壁の彩度- miku-ro miku-se miku-ta miku-the miku-tinyparadise- miku-ジラートP-birthdayofeden- miku-杯本選 miku-般若心経 niconicochorus- yuki- len- luka-di miku-re:package- miku-n rin- -- gwern http://www.gwern.net