
Hi, I'm teaching myself Haskell and attempting to use it for day-to-day tasks at work. I signed up for this list a few weeks ago, and this is my first post. Here's my problem: I'm given a list of 45 Strings, each nine Chars long. The only characters are 'A', 'C', 'G' and 'T'. (This is for a bioinformatics application.) I'll refer to any nine-length String composed of these Chars as a "9-mer". I need to generate a larger list of 9-mers such that no two 9-mers have a Levenshtein distance of less than 3. (The list I start with satisfies this requirement.) I can generate as many 9-mers as I possibly can, but this process is very slow. It's also not being computed lazily; calling head on the output list forces the entire list of 9-mers to be computed. *I'd like to understand why this list isn't being computed lazily, and how I can change my code to make it so. *My knowledge of monads is pretty poor as well, so a digression or a series of questions about why the line [9] >>= (`replicateM` "ACGT") works would also be helpful, as would general tips about writing clean code. This is an O(n^2) operation, so I'm not expecting it to be slow. However, I figured that I'd just be able to take the first N elements from the output list. Here's what I have: import Control.Monad main :: IO () main = interact processData processData :: String -> String processData = unlines . (`merge` ([9] >>= (`replicateM` "ACGT"))) . lines -- Merges two lists of things into a single list merge :: Eq a => [[a]] -> [[a]] -> [[a]] merge xs [] = xs merge xs ys = merge ((head ys) `addInto` xs) $ tail ys -- Adds a thing into a list if it is "different" enough addInto :: Eq a => [a] -> [[a]] -> [[a]] y `addInto` ys = if ((minimum $ map (dist y) ys) < 3) then ys else y:ys -- Calculate the Levenshtein distance -- Lloyd Allison algorithm. See Haskell wiki -- code omitted dist :: Eq a => [a] -> [a] -> Int My workaround to getting a smaller subset of the whole list is to replace [9] >>= (`replicateM` "ACGT") with take 10000 $ [9] >>= (`replicateM` "ACGT") Thanks! Mason Lai
participants (1)
-
Mason Lai