
On Mon, Oct 15, 2007 at 07:54:28PM -0400, gwern0@gmail.com wrote:
Mon Oct 15 19:48:50 EDT 2007 gwern0@gmail.com * ShellPrompt.hs: a quick optimization of nub I saw some complaints about ShellPrompt being slow - and noticed it myself - and it seems ShellPrompt uses 'nub' in an awkward place to uniquefy input. Nub doesn't perform well on long lists, but I once ran into a similar problem and the suggested solution was something clever: convert to a Set and then back to a List. Sets can't have duplicate entries, and they uniquefy faster than nub. The price is that the output is not sorted the same as nub's output would be, but this is OK because the output of (toList . fromList) is immediately passed to 'sort' - which should then produce the same output for both versions. I haven't really tested this but on long directories this should help.
Indeed the benchmarks I tried show that the problem was nub. Quite amazingly changing nub with toList . fromList means reducing cpu time of about 75%. With numb: time promptReadline /usr/bin/ 2878 real 0m8.504s user 0m7.559s sys 0m0.019s time promptGetDirCont /usr/bin/ 2878 real 0m8.429s user 0m7.554s sys 0m0.039s With toList . fromList: time promptReadlineSet /usr/bin/ 2878 real 0m0.110s user 0m0.082s sys 0m0.004s time promptGetDirContSet /usr/bin/ 2878 real 0m0.227s user 0m0.185s sys 0m0.022s It is true that ReadLine is twice quicker that getDirectoryContent but I would prefer not to rely on an external library for such an improvement. What do you think? Andrea