first open source haskell project and a mystery to boot

Hi folks,
Given that I received such excellent help from this newsgroup recently, I wanted to share my first
open-source haskell project available here: https://github.com/aliakhouri/newsagent
It's a simple command line feed (atom, rss) retriever and analyzer in the early stages of development,
using the excellent feed / tagsoup libs to download and analyze feeds from the net.
The real intention is to use it as a platform to learn about information retrieval and machine learning
techniques in haskell.
To this end, I was searching for classification algorithms and I was on the lookout for a nice
clear implementation in haskell of canonical decision tree based classification algorithms.
My first discovery was an old DecisionTree package on hackage but it's poorly documented
and has no examples of usage. So I kept searching...
Then I found an hpaste page (http://hpaste.org/steps/11355) which looked at lot more
promising, but it also has no example or documentation. In fact, it's an island of code without
any references (I don't know who the author is) and nobody has ever referred to it by url or by
blog post). It's a mystery to me.
In any case, I've tried to create a working example but I'm stuck because you can't mix
strings and numbers in a list, and I can't decide whether that's when the author gave up,
or whether I've missed the point. Like I said it's a mystery.
I would appreciate if anyone could shed some light on this whimsical problem.
In case you are wondering why this is relevant to the beginner's forum. Well...
firstly, I am a beginner, and, er... the code is short enough to serve pedagogical purposes (-;
AK

Hi,
On Wed, Oct 12, 2011 at 7:59 PM, Alia
I would appreciate if anyone could shed some light on this whimsical problem.
In case you are wondering why this is relevant to the beginner's forum. Well...
firstly, I am a beginner, and, er... the code is short enough to serve pedagogical purposes (-;
I'm a beginner, too, so my suggestions are not at all authoritative...
--t5 = id3 [a1,a2,a3,a4] [d1,d2,d3,d4]
Can you not use any of the techniques described here: http://haskell.org/haskellwiki/Heterogenous_collections? For example, using a tuple or dynamic types. my 2 cents, L.

On Wed, Oct 12, 2011 at 11:59:30AM -0700, Alia wrote:
-------------------------------------------------------------------- -- Testing Area -------------------------------------------------------------------- outlook s | s == "sunny" = 1 | s == "overcast" = 2 | s == "rain" = 3
temp :: (Real a, Fractional n) => a -> n temp i = (realToFrac i) / (realToFrac 100)
humidity :: (Real a, Fractional n) => a -> n humidity i = (realToFrac i) / (realToFrac 100)
windy x | x == False = 0 | x == True = 1
-- attributes a1 = Discrete outlook a2 = Continuous temp a3 = Continuous humidity a4 = Discrete windy
outlookData = ["sunny","sunny","overcast","rain","rain","rain","overcast","sunny","sunny","rain","sunny","overcast","overcast","rain"] tempData = [85, 80, 83, 70, 68, 65, 64, 72, 69, 75, 75, 72, 81, 71] humidityData = [85, 90, 78, 96, 80, 70, 65, 95, 70, 80, 70, 90, 75, 80] windyData = [False, True, False, False, False, True, True, False, False, False, True, True, False, True] outcomes = [0,0,1,1,1,0,1,0,1,1,1,1,1,0]
d1 = zip outlookData outcomes d2 = zip tempData outcomes d3 = zip humidityData outcomes d4 = zip windyData outcomes
t1 = id3 [a1] d1 t2 = id3 [a2] d2 t3 = id3 [a3] d3 t4 = id3 [a4] d4
--t5 = id3 [a1,a2,a3,a4] [d1,d2,d3,d4] -- doesn't work because you can't mix strings and numbers in a list --
This also doesn't work because [d1,d2,d3,d4] isn't the right type, even if you could mix strings and numbers in a list: d1, d2, etc. are each lists of pairs, so [d1,d2,d3,d4] is a list of lists of pairs. I think what you really want is to combine all the data for each observation into a single structure. Something like this: data Item = Item String Double Double Bool outlook (Item "sunny" _ _ _) = 1 outlook (Item "overcast" _ _ _) = 2 outlook (Item "rain" _ _ _) = 3 temp (Item _ i _ _) = (realToFrac i) / (realToFrac 100) humidity (Item _ _ i _) = (realToFrac i) / (realToFrac 100) windy (Item _ _ _ False) = 0 windy (Item _ _ _ True) = 1 -- attributes a1 = Discrete outlook a2 = Continuous temp a3 = Continuous humidity a4 = Discrete windy outlookData = ["sunny","sunny","overcast","rain","rain","rain","overcast","sunny","sunny","rain","sunny","overcast","overcast","rain"] tempData = [85, 80, 83, 70, 68, 65, 64, 72, 69, 75, 75, 72, 81, 71] humidityData = [85, 90, 78, 96, 80, 70, 65, 95, 70, 80, 70, 90, 75, 80] windyData = [False, True, False, False, False, True, True, False, False, False, True, True, False, True] outcomes = [0,0,1,1,1,0,1,0,1,1,1,1,1,0] d = zip (zipWith4 Item outlookData tempData humidityData windyData) outcomes t1 = id3 [a1] d t2 = id3 [a2] d t3 = id3 [a3] d t4 = id3 [a4] d t5 = id3 [a1,a2,a3,a4] d Now t5 works just fine. -Brent
participants (3)
-
Alia
-
Brent Yorgey
-
Lorenzo Bolla