
On Fri, Jul 1, 2011 at 9:34 PM, Rogan Creswick
On Fri, Jul 1, 2011 at 3:31 AM, Dmitri O.Kondratiev
wrote:> First of all I need:
...
- tools to construct 'bag of words' (http://en.wikipedia.org/wiki/Bag_of_words_model), which is a list of words in the article.
This is trivially implemented if you have a natural language tokenizer you're happy with.
Toktok might be worth looking at: http://hackage.haskell.org/package/toktok but I *think* it takes a pretty simple view of tokens (assume it is the tokenizer I've been using within the GF).
Unfortunately 'cabal install' fails with toktok: Building toktok-0.5... [1 of 7] Compiling Toktok.Stack ( Toktok/Stack.hs, dist/build/Toktok/Stack.o ) [2 of 7] Compiling Toktok.Sandhi ( Toktok/Sandhi.hs, dist/build/Toktok/Sandhi.o ) [3 of 7] Compiling Toktok.Trie ( Toktok/Trie.hs, dist/build/Toktok/Trie.o ) [4 of 7] Compiling Toktok.Lattice ( Toktok/Lattice.hs, dist/build/Toktok/Lattice.o ) [5 of 7] Compiling Toktok.Transducer ( Toktok/Transducer.hs, dist/build/Toktok/Transducer.o ) [6 of 7] Compiling Toktok.Lexer ( Toktok/Lexer.hs, dist/build/Toktok/Lexer.o ) [7 of 7] Compiling Toktok ( Toktok.hs, dist/build/Toktok.o ) Registering toktok-0.5... [1 of 1] Compiling Main ( Main.hs, dist/build/toktok/toktok-tmp/Main.o ) Linking dist/build/toktok/toktok ... [1 of 1] Compiling Main ( tools/ExtractLexicon.hs, dist/build/gf-extract-lexicon/gf-ex\ tract-lexicon-tmp/Main.o ) tools/ExtractLexicon.hs:5:35: Module `PGF' does not export `getLexicon' cabal: Error: some packages failed to install: toktok-0.5 failed during the building phase. The exception was: ExitFailure 1 Any ideas how to solve this?