On Fri, Jul 1, 2011 at 9:34 PM, Rogan Creswick <creswick@gmail.com> wrote:
On Fri, Jul 1, 2011 at 3:31 AM, Dmitri O.Kondratiev <dokondr@gmail.com> wrote:> First of all I need:

...
> - tools to construct 'bag of words'
> (http://en.wikipedia.org/wiki/Bag_of_words_model), which is a list of words
> in the
> article.

This is trivially implemented if you have a natural language tokenizer
you're happy with.

Toktok might be worth looking at:
http://hackage.haskell.org/package/toktok but I *think* it takes a
pretty simple view of tokens (assume it is the tokenizer I've been
using within the GF).

Unfortunately 'cabal install' fails with toktok:

Building toktok-0.5...
[1 of 7] Compiling Toktok.Stack     ( Toktok/Stack.hs, dist/build/Toktok/Stack.o )
[2 of 7] Compiling Toktok.Sandhi    ( Toktok/Sandhi.hs, dist/build/Toktok/Sandhi.o )
[3 of 7] Compiling Toktok.Trie      ( Toktok/Trie.hs, dist/build/Toktok/Trie.o )
[4 of 7] Compiling Toktok.Lattice   ( Toktok/Lattice.hs, dist/build/Toktok/Lattice.o )
[5 of 7] Compiling Toktok.Transducer ( Toktok/Transducer.hs, dist/build/Toktok/Transducer.o )
[6 of 7] Compiling Toktok.Lexer     ( Toktok/Lexer.hs, dist/build/Toktok/Lexer.o )
[7 of 7] Compiling Toktok           ( Toktok.hs, dist/build/Toktok.o )
Registering toktok-0.5...
[1 of 1] Compiling Main             ( Main.hs, dist/build/toktok/toktok-tmp/Main.o )
Linking dist/build/toktok/toktok ...
[1 of 1] Compiling Main             ( tools/ExtractLexicon.hs, dist/build/gf-extract-lexicon/gf-ex\
tract-lexicon-tmp/Main.o )

tools/ExtractLexicon.hs:5:35:
    Module `PGF' does not export `getLexicon'
cabal: Error: some packages failed to install:
toktok-0.5 failed during the building phase. The exception was:
ExitFailure 1
 
Any ideas how to solve this?