I have a program which does backtracking search in a recursive function.  I followed the chapter in "Real World Haskell" to parallelize it.

Without Control.Parallel, it uses about 25% of the CPU, namely one core out of four. With the suggestions in RWH it runs faster but still uses only 35% of the CPU max. 

So I set about creating a MWE in order to experiment and reach out for help. For some reason, my stripped down program isn't working with parallelism at all! 

There's no effect from the R.W.H. ideas. Can I get some suggestions as to why? 

Here's the program:
--------------------------------------------
import qualified Data.List as L
import Text.Printf
import System.CPUTime
import Data.Function
import Control.Parallel

-- Run backtracking search for a list of doubles, optimizing a 
-- fitness function on the list, called 'evalFunc'. At each step, use 'stepFunc' 
-- to generate a list of possible next Doubles that can be added to the list. 
-- We are done when the list has 'finalSize' elements.
search1_par :: Int ->
  ([Double] -> Double) -> ([Double] -> [Double]) -> [Double] -> (Double,[Double])
search1_par finalSize evalFunc stepFunc listIn
  | length listIn == finalSize = (evalFunc listIn,listIn)
  | otherwise = 
      let steps = stepFunc listIn
          (steps1,steps2) = divideListInTwo steps
          f s = search1_par finalSize evalFunc stepFunc $ s:listIn
          results1 = map f steps1
          results2 = map f steps2
          results = force results1 `par` (force results2 `pseq` (results1++results2))
      in L.maximumBy (compare `on` fst) results
  
force :: [a] -> ()
force xs = go xs `pseq` ()
  where go (_:xs) = go xs
        go [] = 1

divideListInTwo :: [a] -> ([a],[a])
divideListInTwo [] = ([],[])
divideListInTwo xs = (take l xs,drop l xs) where l = length xs `div` 2

---------------------------------------------
-- some sample evaluation (fitness) functions and step generation functions.

eval1 :: [Double] -> Double
eval1 xs = v1 - v2 + v3
  where
    v1 = sum $ zipWith (*) (cycle [1]) xs
    v2 = sum . map (*2) $ zipWith (*) (cycle [1,0]) xs
    v3 = sum . map (*3)  $ zipWith (*) (cycle [1,0,0]) xs


step1 :: [Double] -> [Double]
step1 xs | l == 0 = take 8 $ xs
         | l == 1 = take 8 $ map (/2) xs
         | l == 2 = take 8 $ map (*3) xs
  where
    l = length xs `mod` 3
--------------------------------------------------------------------------------
-- main

main = do
  t1 <- getCPUTime
  let f :: Double -> String
      f x = printf "%5.1f" x
      (_,result) = search1_par 13 eval1 step1 [1,2,3]
  putStrLn $ concatMap f result
  t2 <- getCPUTime
  putStrLn $ printf "CPU time: %.3f" ((fromIntegral $ t2-t1) / 1000000000000 :: Double)