Hi,
A few problems:
1) your input cleaning is not correct, running it over the source file, I
get (among others)
2) If you have a looong but rare word in the text, you will get too many
spaces between the words and the markers, that's not good.
3) you calculate the number of '#'s using the total line length and then
subtract the length of (word + spaces), skewing your output. You should
calculate (spaceForHashes = lineWidth - maxLength - 1) and the use that to
determine the number of hashes.
[Although it should be more complicated really, maxLength should only
include those words that are actually output.]
Style is okay, although like Michael Xavier, I'm more of a where-guy.
You build the map via fromListWith (+). That's bad if you have input with
large counts. fromListWith (+) gives you thunks of the form
(...(1+1)+1)...+1) as map-entries. That costs a lot of space and time.
Better build the map using insertWith',
mp = foldl' ins empty words
where
ins m w = insertWith' (+) w 1 m
> 3. a Python version using defaultdict.
sort is O(n*log n), map and python are O(n*log d), where n is the number ofwords and d the number of distinct words.