Hello Dominik,

I'm not sure what exactly your algorithm is, but one thing that stands out to me is the use of (Text,Text) index pairs instead of something more efficient.

Here is an algorithm that I wrote which (without any optimization tricks) seems to be fast and correct: https://pastebin.com/rDxUFbt3

Dynamic Programming in Haskell is a pleasure due to laziness, especially if the choice variables are dense in the integers. 

In this case, I have a Vector (Vector Bool), where the outer vector is indexed by position into the lowercase string and the inner vector is indexed by position into the uppercase string. vec ! i ! j is True iff there is a solution to the problem for the first i characters in the lowercase string and the first j characters in the uppercase string. Because Vectors are lazy (unless you use one of the packed varieties) you can assign each vector element the value corresponding to its solution - before you even know what the solution is!

To get the final solution, you simply look at the vector element corresponding to using the entirety of both strings. This will force evaluation of the last cell, which will force the evaluation of some other cells, which will force the evaluation of some other cells, etc. etc. If a cell is ever accessed more than once, it still only gets computed one time, so we have memoization.

This form is a little weird and took me a while to get the first time I saw it, but I was delighted when I fully understood it.

If your choice variables are not dense in the integers, you can do the same approach using a memo-trie, although there is a constant factor performance loss compared to vectors.

--Will

On Sun, Jan 14, 2018 at 3:14 PM, Dominik Bollmann <dominikbollmann@gmail.com> wrote:

Hello Haskell-Cafe,

While playing with dynamic programming problems, I've been trying to
solve the "Abbreviation" problem found on hackerrank.com at
https://www.hackerrank.com/challenges/abbr/problem.

Briefly, this problem asks to decide whether a source string s can be
abbreviated into a target string t by capitalizing some of the
characters in s and deleting its afterwards remaining lowercase
characters. For example, the string s = "aBbdD" can be abbreviated as
target t = "BBD", but target t' = "XYZZ" is not an abbreviation for
source s' = "xyz".

My solution to this problem is the following memoization-based
function `isAbbreviation`:


```
import Control.Monad
import Control.Monad.State
import Data.Char
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map
import Data.Maybe
import Data.String
import Data.Text (Text)
import qualified Data.Text as Text
import System.IO

type Store = Map (Text, Text) Bool

isAbbrMemo :: Text -> Text -> State Store Bool
isAbbrMemo s t
  | Text.null t = extend s t $ return (Text.all isLower s)
  | Text.null s = extend s t $ return False
  | otherwise   =
      let (a, as) = fromJust $ Text.uncons s
          (b, bs) = fromJust $ Text.uncons t
      in extend s t $ matches a as b bs
  where
    matches a as b bs
      | isLower a && toUpper a /= b = isAbbrMemo as (b `Text.cons` bs)
      | isLower a && toUpper a == b = (||) <$> isAbbrMemo as bs
                                           <*> isAbbrMemo as (b `Text.cons` bs)
      | isUpper a && a /= b         = return False
      | isUpper a && a == b         = isAbbrMemo as bs

extend :: Text -> Text -> State Store Bool -> State Store Bool
extend s t m = do
  st <- get
  case Map.lookup (s,t) st of
    Just v  -> return v
    Nothing -> do
      v <- m
      modify $ Map.insert (s,t) v
      return v

isAbbreviation :: Text -> Text -> Bool
isAbbreviation s t = evalState (isAbbrMemo s t) Map.empty

main :: IO ()
main = do
  queries <- readQueries stdin
  let answers = map yesNo $ map (uncurry isAbbreviation) queries
  forM_ answers putStrLn

yesNo :: Bool -> String
yesNo True  = "YES"
yesNo False = "NO"

readQueries :: IsString a => Handle -> IO [(a, a)]
readQueries h = do
  numQueries <- read <$> hGetLine h :: IO Int
  forM [1..numQueries] $ \_qid -> do
    s <- hGetLine h
    t <- hGetLine h
    return (fromString s, fromString t)
```

However, running `isAbbreviation` on Hackerrank's input #13 still
takes around 38 seconds on my machine and is therefore too slow to be an
accepted solution. The input of question is attached as a text file.

My question is therefore: Where could I further improve the running time
of the function `isAbbreviation`? Is there any low-hanging fruit to
improve upon? Or is my dynamic-programming based approach somehow
flawed in general? (in which I should rather rethink the problem?)

Any observations, remarks, and improvements on the above code snippet
are greatly appreciated :-)

Thanks, Dominik.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.