
Hello everyone! At the moment I'm working on a function that computes the rank of a matrix. The testing I've performed doesn't seem to reveal anything obviously wrong with what I have so far, so I've decided to see if it can be improved performance-wise. I've sketched up a minimalist criterion benchmark, and run it with profiling options on. But something in the code causes the benchmark executable to loop endlessly. I've stuck a few 'trace's into the code, and it seems that it just keeps on producing and processing more and more matrices. I can kill the process via 'Ctrl-C', and if I do, a profiling report appears. I have attached one here, if you want to see it in full, but the gist is that 70% of time the program does nothing, 10% it does something labeled 'SYSTEM', and another 9% are garbage collection, with the rest of the functions taking less than 2% each. This can go on for more than an hour. At first I've attributed this to the function in question being grossly inefficient, but in the test executable 100 runs of this function on 100x100 matrices take only 18-20 seconds (well, it's actually a test to see if the transpose of a matrix has the same rank as the original, so there's some extra overhead here). What could cause this behaviour? One weak point I can identify myself is 'NFData' (from deepseq package) instance for my matrices - I'm not 100% certain it's correct and does what it should. I don't really think it's the case, though - 'rnf' method of 'NFData' class would then feature prominently in the profiling report, which it doesn't. 'NFData' instance, benchmarking code and the function in question are below ('rank' lacks a type signature since it's actually a method on 'Matrix' class'): ``` ---- the code that works ---- -- | A generic matrix. May or may not be square. data GenericM e = GenericM Int Int (U.Vector e) deriving (Show, Eq) instance NFData (GenericM e) where rnf (GenericM nrows ncols dat) = rnf (nrows, ncols, dat) rank tolerance m@(GenericM nr nc _) = D.trace "new matrix" $ go 0 0 $ V.map rawData $ rows m where go total col allRows | null allRows = total | col >= nc = total | col >= nr = total | abs (maxRow U.! col) < tolerance = go total (col + 1) allRows | otherwise = go (total + 1) (col + 1) nonZeroed where maxIndex = V.maxIndexBy (comparing (abs . (U.! col))) allRows maxRow = allRows V.! maxIndex allRows' = V.imap multiplyAndAdd allRows nonZeroed = V.ifilter notZero allRows' notZero i v = i /= maxIndex && not (isZeroV tolerance v) multiplyAndAdd i v | i == maxIndex = v | otherwise = U.imap (multAddOne i) v multAddOne i j y | j < col = y | otherwise = y - (maxRow U.! j) * coef i coef i = (allRows V.! i U.! col) / maxRow U.! col ---- the code that benchmarks ---- genericDenseMatrixRank :: Benchmark genericDenseMatrixRank = bgroup "linear/matrix/dense/generic/rank" [ env (genRandomGenericMatrix 100) (b "small") --, env (genRandomGenericMatrix 1000) (b "medium") --, env (genRandomGenericMatrix 5000) (b "big") ] where b name m = bench name $ whnf (rank 1e-6) m genRandomGenericMatrix :: Int -> IO (GenericM Double) genRandomGenericMatrix size = do rng <- getStdGen let v = randoms rng :: [Double] return $ mkGenericMFromList size size v ``` Thanks in advance. -- Michail.